Global load based file allocation among a plurality of geographically distributed storage nodes

ABSTRACT

A method for balancing loads on a plurality of geographically distributed storage nodes coupled to a communications network, includes: receiving a request from a user device to download a data file; identifying all storage nodes from a plurality of geographically distributed storage nodes containing the requested data file; selecting a first storage node containing the requested file to serve the request; and determining if the first storage node is too busy, wherein if the first storage node is determined not to be too busy, directing the request to the first storage node, otherwise searching for a second storage node containing the requested data file that is not too busy and, if the second storage node is found, directing the request to the second storage node.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent ApplicationNo. 60/968,848 filed Aug. 29, 2007, the content of which is incorporatedby reference herein in its entirety.

FIELD OF THE INVENTION

The present invention relates generally to data storage, and moreparticularly to a method and system for storing, accessing andmanipulating data in a data communications network.

BACKGROUND OF THE INVENTION

In computing, a file system can store and organize data files in orderto make the data files easier to find and access. File systems may use adata storage device such as a hard disk or CD-ROM to maintain thephysical location of computer files. A file system may provide access todata on a file server by acting as a client for a network protocol. Inother words, file system can be a set of abstract data types that areimplemented for the storage, hierarchical organization, manipulation,navigation, access, and retrieval of data.

A network file system is a file system that acts as a client for aremote file access protocol, providing access to files on a server. Anetwork file system can be any computer file system that supports accessof files over a computer network. A network file system may bedistributed over clients, servers, and storage devices dispersed amongthe machines distributed in an intranet or over the internet. Serviceactivity occurs across the network, and instead of a single centralizeddata repository, the system may have multiple and independent storagedevices. In some network file systems, servers run on dedicatedmachines, while in others a machine can be both a server and a client. Anetwork file system can be implemented as part of a distributedoperating system, or by a software layer that manages the communicationbetween conventional operating systems and file systems. A network filesystem may appear to its users to be a conventional, centralized filesystem. The multiplicity and dispersion of its servers and storagedevices can be made invisible, and the client interface used by programsshould not distinguish between local and remote files. It is up to thenetwork file system to locate the files and to arrange for the transportof data.

A storage delivery network (SDN) may include a network file system thatis used for scalable networking applications. SDNs can be composed ofone or more storage nodes, each node containing one or more servers forstoring data files and at least one transfer server for serving filesand/or media over a network. In one embodiment, the transfer server anda storage server may be implemented by a single server.

SUMMARY OF THE INVENTION

Embodiments of the invention are directed to methods and systems forstoring, accessing, manipulating and controlling folders and/or filesover the internet by utilizing three control layers: a virtual layer, alogical layer and a physical layer. As known in the art, a “folder” maystore one or more “files” and a “file” typically, but not necessarily,stores a predetermined amount of information, data or media content(e.g., a single document, movie, or music/song file).

In one embodiment of the present invention, a file system is accessed,controlled and manipulated over the internet via requests to webservices (e.g., SOAP or REST). These web services interact with one ormore database servers, referred to herein as file system databaseservers or “core servers,” which provide for virtualization of the filesystem and mapping of a virtual layer to a logical layer, which in turnis mapped to a physical layer.

In one embodiment, user information such as file names, path names,metadata, etc. is stored in a virtual layer or virtual file system(VFS), which allows users to share access to the same common physicalfile but assign it individual names, locations and metadata (extendedproperties) within the system. During normal access (e.g., move, copy,delete, rename, etc), the VFS increases speed of file manipulation byeliminating the necessity of “touching” the physical file itself.Rather, the user's directory structure is controlled through the filesystem database server and the data is stored within a series of tables.A web services layer of the system presents the accessing user atree-structured file system and allows the user to manipulate the systemin a familiar fashion.

In a further embodiment, access to a user's file system is secured sothat only authorized users with the correct permissions, in accordancewith each user's account information (e.g., Application Name/User Name)can access the directory structure and the files within each folder. Inextended circumstances, users may have the ability to create “publicshares” and grant or restrict access to shared files or folders byentities external to the SDN, as the user sees fit.

In a further embodiment, names or references to files stored within theVFS are mapped to references stored in a logical file system (LFS). Thisis the layer which allows the system to de-duplicate the common elementsof user inputted files as opposed to simple de-duplication of the fileitself. Files have certain intrinsic properties that do not change fromuser to user, such as embedded metadata, file size, file type. Once afile is uploaded to the system, this information typically does notchange, though it may be overridden by the user. Information storedwithin the LFS is intrinsic to the file, and when a file has differentinformation stored within it, even though the files may appear to beidentical to an end user, the virtue of the different embedded datamakes them different for purposes of de-duplication. However, asexplained in further detail below, if a user chooses to over-ridemetadata (e.g., run time of a video) or other intrinsic informationcontained within a file, the newly created metadata or information isstored in a separate metadata table in the VFS and does not effect themetadata stored in the LFS. Thus, the presence of both the VFS and LFSallows de-duplication of the common elements of a file (e.g., the actualcontent itself) even if a user desires to over-ride other portions ofthe file such as metadata. From the perspective of the user, the filehas been customized to his or her preference. However, for storagepurposes the file itself can still be stored and referenced by aplurality of users.

Beneath the LFS, lies the physical file system (PFS) where the filesactually reside. The files are stored within one or more servers withinone or more nodes. In one embodiment, the logical file system need onlycontain information determining which node(s) each of the files isstored, whereas each node contains the catalog of where each file existson which server(s) within that node. In other words, each nodeautonomously controls the placement of files within itself and the LFSsimply knows that the file exists somewhere within that node. As usedherein, a “node” refers to a storage element containing one or morestorage devices for storing files therein and providing access to files(e.g., uploading and downloading of files). In one embodiment, a nodecontains one or more storage servers, a node manager server forcontrolling and keeping track of where each file resides within thenode, and one or more transfer servers (e.g., web servers) for sendingor receiving files to end users.

In one embodiment, when a store, put or upload request (collectivelyreferred to as an “upload” request) is received by the system, the VFSdetermines which user is adding the file and determines, for example,via geocode, node storage availability, and other criteria, which nodethe user should upload to and redirects the user to the proper node forupload. The user's connection to the core server is then severed and theconnection is established with the designated node, which beginsaccepting the packets of the file. When the file upload is complete, atransfer server at the node to which the file has been uploaded, makes arequest back to the VFS initiating an entry into the user's VFS,creating a folder path or virtual file for the user and assigning thenew entry a temporary logical file ID (LFID) so that the user can accessthe newly uploaded file immediately. The transfer server then notifiesthe node's internal processing system by adding an entry into aprocessing queue.

The processing system then processes the file by applying a hashingalgorithm to it, e.g., the MD5 file hashing algorithm. Once this hash isdetermined, a “media key” is created by logically combining the hashwith the file's size in bytes. The processing system then communicateswith the LFS which then determines whether or not an identical filealready exists within the system. The LFS checks its database tables todetermine if there is an identical media key. The LFS then determineswhether the file exists “near enough” to the user requesting upload ofthe file via geocode comparisons. If the file does exist at a “nearenough” node, the LFS notifies the VFS and the temporary LFID referencedby the VFS is replaced with the permanent LFID associated with theidentical file stored in the “near enough” node. If an identical file isonline and is “near enough” the LFS informs the node to mark therecently uploaded file for deletion and temporarily stores the file at adesignated storage location. All uploaded files marked for deletion arecleaned up (deleted) by a daemon which crawls the system as a backendprocess that is transparent to the user.

If the LFS determines that the file does not previously exist in anynetwork node, or that the file does not exist “near enough,” or that thefile is offline, it then extracts metadata from the file and createslogical file tags for storage in a metadata table within the LFS. TheLFS then assigns a new permanent LFID to the new file and requests thedesignated node to place a copy of the file within a selected storageserver and update the node manager database with the new LFID andlocation of the new physical file. The LFS also notifies the VFS of thenew LFID assigned to the new file.

In a further embodiment, the invention determines whether a node orother network resource is “near enough” by determining a physicallocation associated with a user computer by translating its IP addressinto a geocode and, thereafter, comparing this geocode with a geocodeassociated with one or more nodes or other network resources. The methodand system of the invention then assigns one or more nodes or networkresources (e.g., servers) to service the user's request (e.g., an uploador download request) based at least in part on the location of thenetwork resource relative to the location of the user's computer asdetermined by respective geocodes associated with the user's computerand the network resource.

As used herein a “geocode” refers to any code or value which isindicative of a geographic location of an object, device or entityassociated with the geocode. One type of geocode that is known in theart is used, for example, by the U.S. postal service to assign codes togeographic regions or areas. In general, the geocode is a code thatrepresents a geospatial coordinate measurement of a geographic locationand time. A geocode representation can be derived, for example, from thefollowing geospatial attributes: latitude, longitude, altitude, date,local time, global time and other criteria, such as, how the area iscoded (e.g., number, letter, mixture of both, or other), which part ofthe earth is covered (e.g., whole earth, land, water, a continent, acountry, etc.), what kind of area or location is being coded (e.g.,country, county, airport, etc.), and/or whether an area or point isbeing coded. Generally, a geocode is a number representation that takesinto account some or all of the above criteria.

Every computer or device that communicates over the Internet has aunique Internet Protocol (IP) address assigned to it. Computers anddevices residing within a pre-determined geographic region or area aretypically assigned a specified range of IP addresses. For example, allcomputers within Japan may have IP addresses in the range of43.0.0.0-43.255.255.255 (Source: IANA, Japan Inet, Japan (NET-JAPAN-A).

In one embodiment, when a user or customer makes an upload (a.k.a.,“put” or “store”) or download (a.k.a., “get” or “retrieve”) request, viaa web services interface, for example, the request is received by a filesystem server (a.k.a., “core system server”) which translates the IPaddress associated with the incoming request into a geocode. In oneembodiment, the system looks up a table that correlates IP addresseswith geocodes, or IP address ranges with geocode ranges. After the IPaddress has been translated into a geocode, the system compares thegeocode to the geocodes that have been assigned to network resources(e.g., a storage node) within the network and determines,algorithmically, which resources are “nearest” the requestor. If onlyone resource is “near enough,” the user is redirected to that resource.If multiple resources are “near enough,” the system may determine whichof the resources is currently experiencing the lightest volume ofrequests (e.g., via updatable polling) and redirect the requester tothat resource. Or, in an alternative implementation, the requester maybe directed to the absolute nearest resource, regardless of currentvolume. In one embodiment, the core system determines if a networkresource is “near enough” by subtracting the geocode identified for theincoming request from the geocode associated with the target resourceand determining if the absolute value of the difference exceeds apredetermined threshold. In another embodiment, whether the requester'sgeocode indicates the requester is near enough a resource can simply bedetermined by accessing a look up table (e.g., a node priority list)which assigns nodes to geocode ranges.

In one embodiment, if the user request is an upload request, whendetermining which network storage nodes are “closest,” an amount ofavailable storage at each storage node is taken into consideration as afactor. After the closest storage node has been selected by the coresystem, the user request is redirected to that node and the user mayimmediately begin to upload his or her file(s) to an upload server atthe designated node. When an incoming file is received, the uploadserver temporarily stores the file in an upload cache memory while aprocessing system within the node processes the received file. Thisallows the user to access the newly uploaded file immediately via adownload server at the node, if desired. Thus, there is no delay due tofile processing.

In one embodiment, initial download requests (e.g., a retrieve or “get”requests) associated with a user IP address are received via a webservices interface by the core system. Via geocode comparison, forexample, the core system will identify the closest storage nodecontaining the requested file and redirect the user request to thatnode. It should be understood that even though an online node thatstores the requested file is deemed to be “closest,” this does notnecessarily mean it is “near enough” to the user. The designated nodecan then start transmitting the requested file to the user with minimumlatency. As the transmission is taking place, a processing system (e.g.,one or more servers) within the node determines whether the node is“near enough” based on a geocode associated with the user computermaking the download request.

In one embodiment, a difference in geocode values associated with theuser's computer and the storage node is indicative of a distance betweenthe node and the requesting computer or device. If the distance exceedsa predetermined threshold, the node notifies the core system of thedistance value. The core system will then determine if there are otheronline nodes that are “near enough” to the user and whether any of thosenodes contain a copy of the requested file (in the event that apreviously offline file recently came online). If there are no “nearenough” online nodes that contain the file, the core system will directthe previously designated node to transfer the file to the closest ofthe “near enough” nodes. If there is a “near enough” online node thatcontains a copy of the file, the user will be redirected immediatelyprior to beginning his download. In an alternative implementation, alldownload requests begin at the core and thereafter directed to theproper node. In an alternative embodiment, whether a storage node is“near enough” the user computer may be determined by looking up a nodepriority table to see whether a geocode or geocode range associated withthe user computer has been assigned for that node.

In one embodiment, after a near enough node has been identified inresponse to an initial download request, as described above, subsequentrequests by the same computer system for the same file, will not go tothe core system via a web services interface. Instead, the customerapplication interface keeps a record of the previous request and thepreviously identified “near enough” node, and redirects any subsequentrequests for the same file by the same IP address directly to that “nearenough” node. In one embodiment, a permanent redirection only takesplace if a “near enough” node is found. If a requested file exists inthe system, but not in a near enough node, the redirect is temporary.

In a further embodiment, additional information that can be includedwithin a geocode, or become part of the “near enough” or distancecalculation may include, for example, quality of service (QoS) asdetermined by a service level agreement (SLA) associated with aparticular user, number of accesses to the requested file during apre-specified period, number of accesses by the particular user,bandwidth speeds and availability, relative connectivity (i.e., how busya node is) and master internet trunk information.

In another aspect of the invention, a method for balancing loads on aplurality of geographically distributed storage nodes coupled to acommunications network, includes: receiving a request from a user deviceto download a data file; identifying all storage nodes from a pluralityof geographically distributed storage nodes containing the requesteddata file; selecting a first storage node containing the requested fileto serve the request; and determining if the first storage node is toobusy, wherein if the first storage node is determined not to be toobusy, directing the request to the first storage node, otherwisesearching for a second storage node containing the requested data filethat is not too busy and, if the second storage node is found, directingthe request to the second storage node.

In a further aspect, a system for balancing loads on a plurality ofgeographically distributed storage nodes coupled to a communicationsnetwork, includes: a web services interface operable to receive arequest from a user device to download a data file, wherein the userdevice is associated with an internet protocol (IP) address; a databasecontaining one or more data tables correlating a plurality of IPaddresses to a plurality of geocode values, each geocode valuecorresponding to a specified geographic region, and assigning at leastone of a plurality of geographically distributed storage nodes to serveuser devices associated with each geocode value, the one or more tablesfurther identifying one or more storage nodes wherein each of aplurality of data files are stored; and a server coupled to thedatabase, the server including: a first module for receiving a requestfrom a user device to download a data file; a second module foridentifying all storage nodes from a plurality of geographicallydistributed storage nodes containing the requested data file; a thirdmodule for selecting a first storage node containing the requested fileto serve the request; and a fourth module for determining if the firststorage node is too busy, wherein if the first storage node isdetermined not to be too busy, directing the request to the firststorage node, otherwise searching for a second storage node containingthe requested data file that is not too busy and, if the second storagenode is found, directing the request to the second storage node.

In another embodiment, the invention provides a computer readable mediumstoring computer executable instructions that when executed perform aprocess for balancing loads on a plurality of geographically distributedstorage nodes, the instructions including: a first code module forreceiving a request from a user device to download a data file; a secondcode module for identifying all storage nodes from a plurality ofgeographically distributed storage nodes containing the requested datafile; a third code module for selecting a first storage node containingthe requested file to serve the request; and a fourth code module fordetermining if the first storage node is too busy, wherein if the firststorage node is determined not to be too busy, directing the request tothe first storage node, otherwise searching for a second storage nodecontaining the requested data file that is not too busy and, if thesecond storage node is found, directing the request to the secondstorage node.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure, in accordance with one or more embodiments, isdescribed in detail with reference to the following figures. Thedrawings are provided for purposes of illustration only and merelydepict typical or exemplary embodiments of the disclosure. Thesedrawings are provided to facilitate the reader's understanding of thedisclosure and shall not be considered limiting of the breadth, scope,or applicability of the disclosure. It should be noted that for clarityand ease of illustration these drawings are not necessarily made toscale.

FIG. 1 illustrates an exemplary storage delivery network (SDN) system inaccordance with one embodiment of the invention.

FIG. 2 illustrates a block diagram of an SDN in accordance with oneembodiment of the invention.

FIG. 3 illustrates exemplary directory structures for folders and filesuploaded by two exemplary end users in accordance with one embodiment ofthe invention.

FIG. 4 illustrates exemplary virtual file system (VFS) tables that storeuser information corresponding to the directory structures and pathnames of FIG. 3 in accordance with one embodiment of the invention.

FIG. 5 illustrates exemplary logical file system (LFS) tables inaccordance with one embodiment of the invention.

FIG. 6 illustrates an exemplary Physical File Table which is stored in anode manager database server in accordance with one embodiment of theinvention.

FIG. 7A illustrates an exemplary storage node architecture in accordancewith one embodiment of the invention.

FIG. 7B illustrates a flowchart of an exemplary process for movingrequested files from one storage node to another.

FIG. 8A illustrates a flowchart of an exemplary upload process performedby a designated node in accordance with one embodiment of the presentinvention.

FIG. 8B illustrates an exemplary process for decreasing file uploadduration in accordance with one embodiment of the invention.

FIG. 8C illustrates a flowchart of an exemplary download processperformed in accordance with one embodiment of the present invention.

FIG. 8D illustrates an exemplary process for global usage based filelocation manipulation in accordance with one embodiment of theinvention.

FIG. 9 illustrates an exemplary IP address-to-geocode translation tablein accordance with one embodiment of the invention.

FIG. 10A illustrates exemplary geocode regions surrounding two storagenodes in accordance with one embodiment of the invention.

FIG. 10B illustrates an exemplary node priority table in accordance withone embodiment of the invention.

FIG. 10C illustrates exemplary geocode regions based on longitudecoordinates in accordance with one embodiment of the invention.

FIG. 10D illustrates a flowchart of an exemplary file locationmanipulation process in accordance with one embodiment of the invention

FIG. 11A illustrates an exemplary environment where an exemplaryinter-node load balancing process can be performed in accordance withone embodiment of the invention.

FIG. 11B illustrates a flowchart of an exemplary inter-node loadbalancing process performed at the core system in the exemplaryenvironment of FIG. 11A in accordance with one embodiment of theinvention.

FIG. 11C illustrates a flowchart of an exemplary inter-node loadbalancing process performed at a storage node in the exemplaryenvironment of FIG. 11A in accordance with one embodiment of theinvention.

FIG. 11D illustrates a flowchart of an exemplary intra-node loadbalancing combined with an inter-node load balancing process inaccordance with one embodiment of the invention.

FIG. 12 illustrates a flowchart of an exemplary cleanup process inaccordance with one embodiment of the invention.

FIG. 13 illustrates a flowchart of an exemplary process of storing filesusing an internet media file system (IMFS) in accordance with oneembodiment of the invention.

FIG. 14 illustrates an exemplary download sequence that may beimplemented using an IMFS core database in accordance with oneembodiment of the invention.

FIG. 15 illustrates an exemplary file relocation and download sequencethat may be implemented using an IMFS core database in accordance withone embodiment.

DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENTS

Various embodiments of the present invention are directed toward systemsand methods for storage delivery network (SDN) systems that enable usersto store, retrieve, and manipulate files from a remote location using arich set of web service application programming interfaces (APIs).Embodiments of the invention are described herein in the context ofexemplary applications. As would be apparent to one of ordinary skill inthe art after reading this description, these applications are merelyexemplary and the invention is not limited to operating in accordancewith these examples. It is to be understood that other embodiments maybe utilized and structural changes may be made without departing fromthe scope of the present invention.

In accordance with one embodiment, an SDN system may store, access,manipulate, and control folders and/or files over the Internet byutilizing three control layers: a virtual layer, a logical layer, and aphysical layer.

FIG. 1 illustrates an exemplary SDN system 100 in accordance with oneembodiment of the invention. The SDN system 100 may comprise a coresystem 102, which may control one or more distributed storage deliverynodes 112A, 112BB-112K. The SDN system 100 may also comprise a customerapplication interface 110, which may serve a plurality of end users 114.The core system 102, the distributed storage delivery nodes 112A,112B-112K, and the customer application interface 110 can communicatevia a communication network such as the Internet 101.

The core system 102 may comprise a web services server 104, a firewallserver 106, and an Internet media file system (IMFS) 108. It isunderstood that the core system 102 may comprise any number of servers(e.g., the web services server 104, firewall server 106) for performingits tasks and operations described herein. In addition, the variousfunctionalities and operations described herein may be consolidated intoa fewer number of servers or processors, or distributed among a largernumber of servers or processors, as desired in accordance with networkrequirements.

The web services server 104 may accept requests from end users 114(e.g., via customer application interface 110) related to accessing,storing and manipulating files stored on the SDN system 100. The webservices server 104 may also redirect end users 114 to appropriatestorage delivery nodes 112 during uploading and downloading of mediafiles, for example.

The firewall server 106 provides a software application, which inspectsnetwork traffic passing through the web services server 104, and permitsor denies passage based on a set of rules. A firewall's basic task is toregulate some of the flow of traffic between computer networks ofdifferent trust levels. Typical examples are the Internet which is azone with no trust and an internal network which is a zone of highertrust. A firewall's function within a network is to prevent unauthorizedor unwanted network intrusion to the private network.

In accordance with one embodiment, the IMFS 108 includes a computerdatabase and computer programs that provide file system services to theweb services server 104. In one embodiment, the IMFS 108 includes avirtual file system (VFS) 105, and a logical file system (LFS) 107. TheIMFS 108 may organize the storage of data using a database structure,such as a relational database structure. Examples of other databasestructures that may be used are hierarchical database and objectoriented database structures. Database management systems may beincluded in the IMFS 108 to organize and maintain the database. The IMFS108 may also comprise a computer or computers dedicated to running theIMFS 108.

In one embodiment, the core system 102 communicates with a customerapplication interface 110 via the Internet 101 in accordance with a webservices protocol (e.g., Simple Object Access Protocol (SOAP) orRepresentational State Transfer (REST)). The customer applicationinterface 110 provides requested files (e.g., music or video files) andservices (e.g., video streaming) to a plurality of end users 114 whohave purchased or subscribed to the customer application interface. Invarious embodiments, the customer application interface 110 can be ahosted website on a server, or an application running on a personalcomputer or other computing device (e.g., a mobile phone or personaldigital assistant (PDA)).

With further reference to FIG. 1, physical end user files are stored inphysical file storage (PFS) distributed across storage delivery nodes112A, 112B-112K. Each distributed storage delivery node 112A, 112B-112Kmay include a plurality of processing servers 1-M, 1-N and 1-Orespectively (where A, B and K, and M, N and O can be any positiveinteger value). In one embodiment, each distributed storage deliverynode 112A, 112B-112K has a node manager database server, a transferserver for handling uploading and downloading of files, one or moreprocessing servers for processing the files, and one or more storageservers for storing files after they have been processed. An exemplarystorage delivery node 112 is explained in more detail below withreference to FIG. 7.

FIG. 2 illustrates an exemplary block diagram of an SDN system 200 inaccordance with one embodiment of the invention. Various elements of SDNsystem 200 may be identical or similar to elements of SDN system 100 ofFIG. 1. SDN system 200 includes a web services subsystem 202, an IMFS204 (similar to IMFS 108 in FIG. 1), distributed storage delivery nodes220 (similar to storage delivery nodes 112A, 112BB-112K of FIG. 1), anaccount management subsystem 206, and a transaction warehouse/analyticssubsystem 208. SDN system 200 may also comprise middle tier logic 210coupled to the IMFS 204, storage delivery nodes 220, and the accountmanagement subsystem 206. SDN system 200 further includes a sharingengine subsystem 212 and server side processing applications 214. Eachof these systems and applications are described in further detail below.

The web services subsystem 202 can provide an application programinterface (API) to end users 114 (FIG. 1) via the Internet 101. Inexemplary embodiments, the web services subsystem 202 operates industrystandard REST and/or SOAP protocols allowing end users 114 to upload,copy, move and delete files and folders. Furthermore, end users 114 canretrieve a listing of their files stored in SDN system 200 andassociated user defined tags and metadata. In one embodiment, the webservices subsystem 202 presents the end user 114 with a tree-structuredfile system allowing the end users 114 to store and access files in afamiliar fashion. In one embodiment, the file system is presented to theend user as a virtual hard drive on the end user's computing device.Communications between the end users 114 and core system 102 servers(FIG. 1) can use the Hypertext Transfer Protocol over Secure SocketLayer (HTTPS) protocol.

With further reference to FIG. 2, the IMFS 204 can include a VirtualFile System (VFS) 216 and a Logical File System (LFS) 218 for managingfiles stored on the SDN system 200.

The VFS 216 can function as an abstraction layer on top of one or moreconventional file systems to provide a uniform interface that is used toaccess data or files from one or more storage locations via acommunications network. For example, VFS 216 can be an abstraction of aphysical file storage system implementation, providing a consistentinterface to multiple file and/or storage systems, both local andremote. In other words, the VFS 216 can allow end users 114 to accessdifferent types of file or file systems in a uniform way. The VFS 216can, for example, be used to access local and remote network storagedevices transparently without the client application noticing thedifference. Additionally, in one embodiment, the VFS 216 can be used tobridge the differences in various types of file systems, so that clientapplications can access files on local or remote file systems withouthaving to know what type of file systems directly control access tothose files. Thus, the consistent interface provided by VFS 216 canallow the end users 114 to uniformly interface with a number of diversefile system types.

The VFS 216 stores end user information and controls end user directorystructures (e.g., a tree structure) presented to end users 114 accessingfiles stored in SDN system 200. Directory structures can be presented tothe end users 114 via the web services subsystem 202. As will beexplained in further detail below, the VFS 216 includes a database thatstores tables populated with information related to user files stored onthe SDN system 200. For example, these tables can be populated by userfolder names (e.g., “Scott's music”), user assigned file names (i.e.,virtual file name), user overridden metadata, directory and/or pathinformation, as well as virtual file identification (VFID) valuesassociated with stored files. The VFID can be used to correlate eachvirtual file name with logical file and/or physical file information.

The LFS 218 provides an application with a consistent view of what canbe, for example, multiple physical file systems and multiple file systemimplementations. In one embodiment, file system types, whether local,remote, or strictly logical, and regardless of implementation, areindistinguishable for applications using LFS 218. A consistent view offile system implementations is made possible by the VFS 216 abstraction.The VFS 216 abstraction specifies a set of file system operations thatan implementation includes in order to carry out LFS 218 requests.Physical file systems can differ in how they implement these predefinedoperations, but they present a uniform interface to the LFS 218.

The LFS 218 stores information about files stored on SDN system 200,such as a media key (e.g., hash key), metadata, file size, file type,and the like. The LFS 218 also stores a logical file identification(LFID) value that is used to correlate or link a corresponding VFID withone or more physical files located in the distributed storage deliverynodes 112A, 112B-112K (FIG. 1). Thus, the LFS 218 acts as anintermediate layer that correlates the virtual layer with the physicallayer. It is appreciated that many VFIDs may correspond to a singleLFID, which in turn may correspond to one-to-many physical filesdistributed in various geographically distributed storage delivery nodes112A, 112B-112K. For example, if multiple users have uploaded into theirdirectory a song (e.g., “Wish You Were Here” by Pink Floyd), thenmultiple VFID's corresponding to the respective multiple user songs maybe correlated to a single LFID that identifies the common song. Thissingle LFID may then be linked (e.g., via SQL relational databasetables) to one or more physical files. For redundancy or accessperformance reasons, multiple physical files corresponding to the songmay be stored in more than one storage server. However, it is notnecessary to store a physical file for each user. Multiple users canshare access to a single physical file. In this way, the SDN system 200allows de-duplication of files, thereby saving a considerable amount ofstorage real estate.

The distributed storage delivery nodes 220 (similar to 112A, 112B-112Kin FIG. 1) comprise optional archival file storage (AFS) 222, permanentfile storage 224. The distributed storage delivery nodes 220 includephysical file storage devices such as one or more hard drives. The AFS222 may archive files, including compressed versions of files stored onor previously stored on the permanent file storage 224. Each storagedelivery node 220 also stores one or more tables (e.g., relationaldatabase tables) populated by information indicating where files arestored within the respective storage delivery node. The tables may alsobe populated with path information for each file stored in thedistributed storage delivery node 220, and information correlating eachfile with a logical file identification value (LFID).

Further to FIG. 2, the storage delivery nodes 220 can also include acache file system 221, a hierarchical storage system 223 and amanagement and policy-based file replication system 225. The cache filesystem may be used to temporarily store data before it is stored in amore permanent type of memory storage system. The hierarchical databasemay be used to manage how data is stored in a hierarchical fashion. Themanagement and policy-based file replication system may be used formanaging how many copies of each file are to be stored and whethercopies of the files should be stored on high availability storage orarchive storage, for example.

The SDN system 200 can also comprise an account management subsystem 206that manages accounts for end users 114 and/or customers that have anaccount to access and use the SDN system 200. A customer may be, withoutlimitation, a content and/or application provider. The accountmanagement subsystem 206 can, for example, control who can accesscertain applications and/or content, track usage, and calculate pricesand payment data in accordance with a customer's service level agreement(SLA).

An SLA can be an agreement between one or more users and an SDN systemadministrator or customer, which provides a client interface applicationto the one or more users. The SLA specifies a level of service (e.g.,quality of services, storage and access rights and preferences, etc.) tobe provided to the users.

The transaction warehouse 208 can store archival information regardingtransactions performed within the VFS 216, including billing, paymenthistory and file operations. This allows for reporting information to begathered historically.

The middle tier logic 210 does string validation and prepackagesuser-inputted data for entry into the IMFS 204. As data is returned fromthe IMFS 204, the middle tier logic 210 un-packages it for serializationand presentation to the end users 114. In one embodiment, end users 114need not issue commands directly to the IMFS 204; rather, end userinputs are parsed and transmitted to the IMFS 204 via the middle tier210. Data returned from the IMFS 204 may go through this same middletier logic 210. This provides for additional security and commandvalidation prior to entry into the SDN system 200.

In addition to providing secured access to uploaded files, users of theIMFS 204 may have the option of allowing access to individual virtualfolders and files to other users. This is accomplished through thesharing subsystem 212 which can be directly correlated to the VFS 216.In this manner, once a user has sent the IMFS 204 a sharing command, aseparate entry is created within the VFS 216 linked to the originalrecord. Creation of the entry in the VFS 216 allows the end users 114 toshare the file or folder using a different name for the file or folder,but without duplicating the file or folder. End users 114 see thevirtual file or folder, and the VFS 216 provides the connection to theoriginal file of folder. Additionally, access restrictions (by IP,password, and so on) can be added to a shared resource, allowinggranular control over whom the user is granting access to. Sharingsubsystem 212 may also perform public folder mapping functions andfunctions related to widget creation for APIs.

Uploaded files are processed into the VFS 216 and LFS 218 via a customfile system command processor service. The command processor service canbe performed by command processing servers 214, which can determine theuniqueness of each file and perform transcode services as determined bya controlling SLA. Command processing servers 214 can also be used forprocessing new plug-ins, format translation, advanced tagging, imagemanipulation and video transcoding.

The command processing servers 214 can also perform metadata extractionsto populate the LFS tables with metadata information as explained inmore detail in the context of FIG. 5. In one embodiment, the commandprocessing servers 214 can determine which commands need to be runthrough a queuing system operating, for example, Microsoft MessageQueuing (MSMQ). Further, these queuing system commands can be added tocommand processing servers 214 without modifying the internal process ofthe command processing servers 214. The queuing system determines apriority order of each queuing command and balances them across each ofthe command processing servers 214.

FIG. 3 illustrates exemplary directory structures 300 for folders andfiles uploaded by two end users named Scott and Rich. These directorystructures 300 may be represented by the following three virtual pathnames: Scott\music; Scott\video\movies\pirates.mov; andRich\movies\caribbean.mov.

FIG. 4 illustrates exemplary VFS tables 400 that store user informationcorresponding to the directory structures and path names of FIG. 3. Inone embodiment, the VFS tables 400 comprise SQL relational databasetables and include a Virtual Folder Table 4A, a Virtual File Table 4B,and a Virtual Metadata Table 4C. The Virtual Folder Table 4A comprises a“Folder ID” column 402, a “Folder Name” column 404 and a “Parent FolderID” column 406. As shown in FIG. 4, the Folder ID column 402 contains aunique folder ID value (e.g., values 1-6 in this example) for each userfolder that is generated by the VFS 216. The “Folder Name” column 404contains the name selected by the respective user for each folder (e.g.,Scott, Music, etc. in this example). The names in column 404 may be, butneed not be, unique. The Parent Folder ID (PFID) column 406 contains theunique Folder ID value of the parent folder of each respective childfolder. If the folder is a root folder, its PFID value is null.

The Virtual File Table 4B comprises a “File ID” column 410, a “FileName” column 412, a PFID column 414 and a Logical File ID (“LFID”)column 416. The Logical File ID column 416 contains a unique file IDvalue (e.g., 101) that is generated for each user file, regardless ofwhether other users may have uploaded that identical file. The File Namecolumn 412 contains the name of the file that is selected by itsrespective owner/user (e.g., Pirates and Caribbean in the presentexample). The PFID column 414 is similar to the PFID column 406discussed above with respect to the Virtual Folder Table 4A. The PFIDcolumn 414 contains the Folder ID value 402 of the folder in which thefile is stored. For example, the file named “Pirates” has a File ID12345 and is stored in the folder associated with Folder ID “4” incolumn 410, which is the folder named “movies.” The LFID column 416contains a value generated for each unique file. If a file is identicalwith another file, their LFID values may also be identical. Thus,multiple virtual files referencing identical data or content may have asingle common LFID value (e.g., 101 in this example). This allowssharing and de-duplication of physical files, thereby reducing thenumber of physical files that must actually be stored in physicalmemory.

The Virtual Metadata Table 4C stores metadata that has been created by arespective end user to override pre-existing metadata contained withinthe original file. In one embodiment, the Virtual Metadata Table 4Ccontains a File ID column 418 and one or more Metadata Type columns 420.The Metadata Type columns 420 may include columns for image width, imageheight, video width, video height, video duration, video bit rate, videoframe rate, audio title, artist, album, genre, track, bit rate,duration, and other desired information about data or media content. TheVirtual Metadata Table 4C allows each user to customize a respectivefile to a limited extent without affecting whether de-duplication may beappropriate for that particular file. Since the overridden metadataresides only in the VFS 216, only the respective user may access or usethat metadata. Furthermore, since the original physical file is notmodified, its integrity remains intact and can be de-duplicated if anidentical physical file was previously stored in the network.

FIG. 5 illustrates exemplary LFS tables 500, in accordance with oneembodiment of the invention. The LFS tables 500 include a Logical FileTable 5A, a Logical Node Table 5B, and a Logical Metadata Table 5C. TheLogical File Table 5A comprises an LFID column 502, a “Media Key (Hash)”column 504, and a “File Size” column 506. The LFID column 502 stores aunique logical value for each unique file and serves as the linkingparameter to the VFS tables 400 discussed above with respect to FIG. 4.The Media Key column 504 stores a unique algorithmically calculatedvalue (e.g., media key or hash) for each unique data file. Toillustrate, in the present example Scott's movie named “Pirates” andRich's movie named “Caribbean” refer to the identical data filecontaining the movie “Pirates of the Caribbean”. Both Scott's movie andRich's movie will be assigned the same LFID (e.g., 101), because thehash algorithm will generate an identical media key value or hash value.As shown in FIG. 5, one entry in the LFID column is “−1”, which, asdiscussed above, indicates a temporary value stored in one or more LFStables 500. The “−1” entry remains until a media key is calculated by adesignated storage node to determine whether the file can bede-duplicated or needs to be physically stored at a designated storagenode. The File Size column 506 contains the file size value of theassociated physical file.

The Logical Node Table 5B contains an LFID column 508, a Node ID column510 and an Online column 512. The LFID column 508 links the Logical NodeTable 5B with the Logical File Table 5A. The Node ID column 510associates a unique value assigned to respective storage nodes in thedistributed storage delivery nodes 112 with each LFID value. Thus, theNode ID column 510 indicates in which node 112 a physical fileassociated with an LFID is located. The Online column 512 contains abinary value that indicates whether a corresponding storage node isonline or offline. Depending on a user's or customer's service levelagreement (SLA), for example, a particular user's physical files may bestored at multiple physical locations for redundancy purposes. Theparticular user's physical files may also be stored at multiple physicallocations to accommodate upload and download performance requirementsfor a particular application or file. Therefore, the copies of thephysical file may be stored in multiple storage nodes. At various times,and for various reasons, one or more of such multiple storage nodes maybe offline (e.g., due to hardware failure, down for maintenance, etc.).In the exemplary table, a “1” in the Online column 5B indicates therespective storage node is online and operational and a “0” indicatesthe corresponding storage node is offline.

The Logical Metadata Table 5C comprises an LFID column 514 whichcontains the LFID value for each unique logical file in one or moreMetadata Type columns 516 that contain the original, intrinsic metadatathat was embedded with the original physical file. The Metadata Typescan be identical or similar to those discussed above with respect toFIG. 4. If an end user has not over-ridden the original metadata withhis or her own custom metadata, the original metadata contained in thistable 5C is available as the default metadata to the end user.

FIG. 6 shows an exemplary Physical File Table 600. Physical File Table600 table includes an LFID column 602 which is used as a common linkingparameter to link back to the VFS and LFS tables 400/500 discussed withreference to FIGS. 4A-4C and 5A-5C. In this manner, Physical file Table600 links the distributed storage delivery nodes 112A, 112B-112K to theIMFS 108 via the LFID generated by the LFS 107. As discussed above, theLFID column 602 stores a unique identification value for each uniquephysical file that generates a unique media key value. The PhysicalLocation 604 column stores location or path information that indicatesthe actual physical location of the file in memory. In FIG. 6, theillustrated path indicates that the file is stored in storage node“ST01” at server “Share1” within the storage node. In addition, theillustrated path indicates further branch names of 7.15, 15.45, and“XYZ47”. The branch name of 7.15 refers to the date the file wascreated. The branch name of 15.45 refers to the time the file wascreated. The branch name of “XYZ47” refers to an exemplary automaticallygenerated pseudo-name of “XYZ47” generated by the node processing server(e.g., by hashing the original name of the file).

As discussed above, the LFS 218 can store information indicating thestorage node or storage nodes in which each file is stored. Inaccordance with one embodiment, the LFS 218 stores informationindicating that the file exists somewhere within a storage node, butdoes not indicate where the file is located within that storage node.Instead, each storage delivery node 112 can autonomously control theplacement of files within itself. Moreover, the Physical File Tablestored within each respective storage node contains the informationindicating where each files stored within a particular storage node arelocated within that storage node.

In one embodiment, the VFS tables 400 are stored in a separate databasefrom the LFS tables 500. Both the VFS tables 400 and the LFS tables 500are separate from Physical File Tables 600, which are stored atrespective geographically distributed storage delivery nodes 112. Byproviding three distinct layers (e.g., the virtual, logical and physicallayers) the SDN system 100 de-couples user information from the actualphysical files belonging to each of the end users 114. In order tosearch for and/or utilize information, a hacker would need to infiltrateat least three separate databases and correlate a vast amount ofinformation to determine which file belongs to which user or customer.Furthermore, a hacker would not likely know in advance whether anyparticular storage node database has any of the physical files a hackermay be interested in. This de-coupling and de-identification of filesfrom users provides added security to sensitive information such asfinancial and bank account information. The de-coupling andde-identification of files from users features may be used to meet HIPPArequirements for de-identification of patient related information andmedical records, for example.

FIG. 7A illustrates an exemplary storage node architecture 700 inaccordance with one embodiment of the invention. The storage nodearchitecture 700 includes one or more upload and processing servers 702,one or more transfer servers 707, one or more storage servers 704, adownload server 706, a node manager database server 708, and an archivestorage node 710. In one embodiment, the archive storage node 710provides a cheaper form of storage (e.g., magnetic tape) than serverstorage drives, and stores archive files which do not need to beaccessed quickly and/or frequently. In exemplary embodiments, frequentlyaccessed user files are stored in one or more storages nodes havinghigh-availability (HA) storage servers; whereas less frequently accessedfiles can be stored in separate archive storage nodes. The HA storageservers can be systems with directly attached storage devices.Alternatively, those servers can be attached to network attached storage(NAS) or a storage area network (SAN).

Various server configurations may be implemented in accordance withdesign requirements and considerations. For example, upload and downloadfunctionalities can be performed by transfer server 707 instead ofseparate servers 702 and 706. In addition, processing functionalitiescan be implemented by a separate server. Furthermore, node managerdatabase server 708 can control and keep track of where files are storedamong the storage servers 704 of storage node 700.

In one embodiment, files can be stored at an archive storage node andcopied to a HA storage node when the file is in demand (e.g. beingaccessed by a user), for example. A file may thereafter be deleted offof the HA storage node when the file is no longer in demand. An ageingalgorithm can be used to determine when the file should be deleted fromthe HA storage. Thus, a copy of a file can be maintained on the archivestorage node 710, copied to a HA storage node when the file is in demand(e.g., when a file is frequently accessed), and deleted from the HAstorage node when the file is no longer in demand.

FIG. 7B is flowchart illustrating an exemplary process 750 for movingrequested files from one storage node to another in accordance with oneembodiment of the present invention. The various tasks performed inconnection with process 750 may be implemented by software, hardware,firmware, a computer-readable medium storing computer executableinstructions for performing the process method, or any combinationthereof. It should be appreciated that process 750 may include anynumber of additional or alternative tasks. The tasks shown in FIG. 7Bneed not be performed in the illustrated order, and process 750 may beincorporated into a more comprehensive procedure or process havingadditional functionality not described in detail herein. Forillustrative purposes, the following description of process 750 mayrefer to elements mentioned above in connection with FIGS. 1-7A. Invarious embodiments, portions of process 750 may be performed bydifferent elements of systems 100-700, such as core system 102, customerapplication interface 110, and the distributed storage delivery nodes112A, 112B-112K. Tasks of process 750 may be performed as backendprocesses that are transparent to the user.

At a step 752, a user requests access to a file stored on an archivenode. The requested copy is then copied from the archive storage node toa HA storage node at step 754. A time since last access date (LAD) offile stored on the HA storage node can then be periodically monitored atstep 756 to determine if the file is in demand. In this regard, the LADcan be compared to a predetermined threshold at decision step 758. Thepredetermined threshold can correspond to a predetermined time period,e.g., 30 days. If the LAD exceeds the threshold (Yes branch of decisionstep 758), then the file is deleted from the HA storage node at step762. If the LAD does not exceed the threshold (No branch of decisionstep 758), then the file is maintained on the HA storage node at step760 and the LAD is periodically monitored again at step 756. If the fileis requested after the file has been deleted from the HA storage node,then process 750 may be repeated.

With reference to FIG. 1, when an upload request from an end user 114 isreceived by the core system 102, the core system 102 can redirect theend user 114 to one of the storage delivery nodes 112A, 112B-112K foruploading the requested file. The end user's connection to the coresystem 102 is then severed, and a connection is established with theupload server 702 at the storage delivery node 112. The node 112 maythen begin accepting data packets of the file from end user 114.

FIG. 8A is a flowchart of an exemplary file upload process 800 inaccordance with one embodiment of the present invention. The varioustasks performed in connection with process 800 may be implemented bysoftware, hardware, firmware, a computer-readable medium storingcomputer executable instructions for performing the process method, orany combination thereof. It should be appreciated that process 800 mayinclude any number of additional or alternative tasks. The tasks shownin FIG. 8A need not be performed in the illustrated order, and process800 may be incorporated into a more comprehensive procedure or processhaving additional functionality not described in detail herein. Forillustrative purposes, the following description of process 800 mayrefer to elements mentioned above in connection with FIGS. 1-7. Invarious embodiments, portions of process 800 may be performed bydifferent elements of systems 100-700, such as core system 102, customerapplication interface 110, and the distributed storage delivery nodes112A, 112B-112K. Tasks of process 800 may be performed as backendprocesses that are transparent to the end user 114.

When an incoming file 802 is received, the upload server 804 stores thefile in an upload cache memory 806. The VFS 105 also creates a folderpath or virtual file for the end user 114 and assigns a temporary LFID(task 808). The temporary LFID may, for example, be a negative LFIDvalue as discussed with reference to FIG. 5A. The temporary LFID allowsthe end user to access the newly uploaded file immediately via adownload server (e.g., server 706 of FIG. 7). In this manner, the impactof file processing delays on a user's ability to access the file can bedecreased or eliminated. The upload server 804 then notifies the node'sinternal processing server 812 by adding an entry (task 810) into aprocessing queue. The entry can contain information such as a physicallocation of the file to be uploaded (e.g., a location of the end user'scomputer), the VFID associated with the file, an account ID associatedwith the end user 114, an application key ID, a temporary location ofthe file, and the like.

With further reference to FIG. 8, processing server 812 applies ahashing algorithm to the uploaded file to calculate a media key for thefile (task 814). The hashing algorithm can be the MD5 file hashingalgorithm (internet standard RFC 1312), for example. The result from thehashing algorithm can be referred to herein as “hash” or a “media key”.Once this media key is created, the processing server 812 may provide acopy of the media key to the LFS 105 (FIG. 1), in accordance with oneembodiment of the invention. The LFS 105 may compare the media key toother media keys in its Logical File Tables (FIG. 5A) to determine if anidentical media key exists (inquiry 816). An identical media keyindicates that an identical file is already stored on the system 100. Ifan identical file is already stored on the system 100 (“Yes” branch ofinquiry task 816), then the temporary LFID is replaced with a permanentor real LFID associated with the previously stored identical file andthe end user's VFID is updated with the real LFID (task 818). Since anidentical file is already stored on the system, the recently uploadedfile can be deleted (task 820).

If the LFS 218 determines that an identical copy of the file is notalready stored on the system 200 (No branch of inquiry task 816), thenthe LFS 218 extracts metadata from the recently uploaded file (task 822)and creates logical file tags (task 824) for storage in a metadata table(FIG. 4C) within the LFS 218. The newly uploaded file may then beassigned a unique LFID, which is stored in LFS 218. The uploaded file isstored in a storage node 112 (FIG. 1) and the Physical File Table storedin a node manager database of the storage node 112 is updated with theLFID associated with the file and a physical location of the file withinthe node (task 826). The LFS 218 is also updated with a Node IDindicating in which node the file is stored (task 828).

FIG. 8B illustrates an exemplary process 830 for decreasing file uploadduration in accordance with one embodiment of the invention. The varioustasks performed in connection with process 830 may be implemented bysoftware, hardware, firmware, a computer-readable medium storingcomputer executable instructions for performing the process method, orany combination thereof. It should be appreciated that process 830 mayinclude any number of additional or alternative tasks. The tasks shownin FIG. 8B need not be performed in the illustrated order, and process830 may be incorporated into a more comprehensive procedure or processhaving additional functionality not described in detail herein. Forillustrative purposes, the following description of process 830 mayrefer to elements mentioned above in connection with FIGS. 1-7A. Invarious embodiments, portions of process 830 may be performed bydifferent elements of systems 100-700, such as core system 102, customerapplication interface 110, and the distributed storage delivery nodes112A, 112B-112K. Tasks of process 830 may be performed as backendprocesses that are transparent to the end user 114.

Process 830 may begin when a designated node begins receiving a filefrom an end user (task 832). In one embodiment, a media key iscalculated by a process local to the file being uploaded. This user-sidemedia key is received shortly after or concurrently with receiving thefile being uploaded (task 834) and compared to previously generated andstored media keys (task 836). In one embodiment, a periodically updatedtable containing all the previously generated media keys are stored ateach node for comparison with received user media keys. In analternative embodiment, the previously generated media keys may bestored in the LFS table 500 (FIG. 5) residing in the core system 102. Inthis embodiment, the designated node may transmit the received usermedia key to the core system 102 for comparison with previously storedmedia keys. In one embodiment, a program (e.g, hash algorithm) isdownloaded or installed on the end user's computer to generate theuser-side media key. The program may be any type of hashing algorithm,for example, as long as it is the identical program used by the coresystem 102 to calculate the media keys stored in the LFS table 500 orwithin memory tables in each node. A match between the user-side mediakey and a previously stored media key indicates an identical filealready exists on SDN system 100. In this way, a determination is madeas to whether a file identical to the file being received has previouslybeen stored in system 100 (task 838). If a match is found and uploadinghas not been completed, then the upload can be aborted (task 840) and a“successful upload” message can be immediately sent to the end user(task 842). The file associated with the matched media key alreadystored on the SDN system 100 can then be designated as a file associatedwith the end user (task 842). In this manner, unnecessary uploading of apreviously existing file is aborted, thereby avoiding storing aduplicate file on the system and decreasing file upload duration. If thematch is not found (No branch of inquiry task 838), uploading of thefile continues until it is completed (task 846) after which a“successful upload” message is sent to the end user (task 847). Finally,the newly uploaded file is designated as a file that is accessible bythe end user (task 848) and stored on the system 100, as described inprocess 800 of FIG. 8A, for example.

In one embodiment, when a download request (a.k.a., a retrieve or “get”request) is received by the core system 102 (FIG. 1), the core system102 determines which one or more distributed storage delivery nodes 112contain the requested file and which of those storage nodes is closestto the end user 114. The end user 114 is redirected to that storagenode. The user's connection to the core system 102 may be severed atthis point. It can be noted that just because a storage node is closestto the end user 114 does not necessarily mean that the storage node is“near enough” to the user's device. For example, even though a firstnode may be determined to be “near enough,” a customer's SLA can dictatethat a second, different node needs to be used to service the end user.Thus, policies in a customer's SLA can override which node is deemedappropriate.

As used herein, an “end user” is an entity that requests uploading anddownloading of files from the SDN. A “customer” can be an end user or,in some instances, a content provider that provides services to many endusers, and which has a SLA with the core system operator. In oneembodiment, policies in a customer's SLA may override some or allintrinsic features of the SDN's storage and file manipulation rules. Forexample, a customer may choose to store files wholly within thecontinental United States, dictating that those files must never beshipped overseas. In this scenario, the logic in the SDN will enforcethe policy by overriding any conflicting rules, ensuring this customer'sfiles are never transmitted to restricted nodes during load balancing,file protection or file migration activities, for example. Customer'smay choose to “lock” their files to a node or series of nodes or withina geographical region. Additionally, customer's may require that onlynodes capable of providing a specified quality of service, no wait orqueuing; etc., can be used to service requests for the customer or thecustomer's clients.

Additionally, a customer may also dictate that any file received by thesystem must immediately be copied to one or more additional nodes, whichmay or may not be specifically designated. This provides redundancy andsecurity against data loss and/or corruption even in the event ofcatastrophe, and can improve performance or quality of service to thatspecific customer. For example, if the customer frequently travels toCalifornia, New York and Europe, the customer may dictate that a copy ofeach of his or her files be stored in a node geographically situated ineach of these regions to minimize latency when he or she requests filesfrom any of these regions.

As a further example, a customer's SLA may dictate that certain groupsof end users, which subscribe to the customer's services, be designatedfor service by specific nodes. For example, a group policy may be setfor a specific group of users to be served by specified storage nodesmanaged by the customer. In this way, node access and utilization may becontrolled or optimized by the customer with respect to the customer'ssubscribers, in accordance with various objectives or criteria specifiedby the customer (e.g., subscriber management, accounting, and/or othercustomer business objectives).

Thus, policies set forth in a customer's SLA can override or supplementthe SDN file allocation and manipulation rules described herein. Somenon-exclusive examples of policies that can be specified in a customer'sSLA include: always maintain a predetermined number (e.g., 2) ofredundant copies of all files associated with the customer in the SDN;only store the customer's files in one or more pre-specified types ofnodes or geographic regions; always serve requests associated with thecustomer's account using the fastest available node; always serverequests associated with the customer's account using the closestavailable node; requests associated with the customer's account must beserved within a maximum latency threshold or satisfy predeterminedquality of service criteria; etc. In one embodiment, a customer's SLA isalways checked before moving, copying, storing, or providing access tofiles associated with the customer. In one embodiment, each customer'sSLA and policies associated therewith are stored in a database coupledto the core system 102 (FIG. 1). In further embodiments, all or a subsetof all customer SLA's may be redundantly stored at designated storagenodes such that the designated storage nodes can notify the core system102 if a directed action violates one or more policies of a relevantcustomer's SLA. Upon receiving such notification, the core system 102can take any remediation measures.

In one embodiment, when a download request is received by the designatednode, the node manager database server 708 (FIG. 7) determines whichstorage server 704 within the storage node 700 houses the file. Atransfer server 707 requests the file location from the node managerdatabase server 708 and then requests the file from the identifiedstorage server (e.g., via a “share” request). The identified server thentransfers the file to the transfer server 707 which then passes the fileto the requesting user (assuming the user has proper access rights). Inone embodiment, the user's connection does not “touch” the servers onwhich their files are stored. Instead, the end user's connection mayaccess files via a web services proxy agent. The web services proxyagent in turn interfaces with a node download server 706 or transferserver 707, but does not interface with the actual storage server 704 inthe storage node 700 (FIG. 7).

FIG. 8C illustrates a flowchart of an exemplary download process 850 inaccordance with one embodiment of the present invention. The varioustasks performed in connection with process 850 may be implemented withsoftware, hardware, firmware, a computer-readable medium storingcomputer executable instructions for performing the process, or anycombination thereof. It should be appreciated that process 850 mayinclude any number of additional or alternative tasks. The tasks shownin FIG. 8C need not be performed in the illustrated order, and process850 may be incorporated into a more comprehensive procedure or processhaving additional functionality not described in detail herein. Forillustrative purposes, the following description of process 850 mayrefer to elements mentioned above in connection with FIGS. 1-7. Invarious embodiments, portions of process 850 may be performed bydifferent elements of systems 100-700, e.g., core system 102, thecustomer application interface 110, and the distributed storage deliverynodes 112. The tasks of process 850 may be performed as backendprocesses that are transparent to the end user 114.

It can be noted that process 850 can perform authentication andauthorization before actually “serving out the bytes” (i.e.,transmitting the file). At the end of each request, process 850 may alsorecord the actual number of bytes served for accounting purposes. If theend user 114 is authenticated and authorized to download the file, thenthe file's content may be streamed to the requesting client (end user).After the request ends, the actual number of bytes served can berecorded for accounting purposes. This can happen even if the clientaborts the download, in which case, the number of bytes served up tothat point can be recorded.

At task 852, an incoming download request is received by transferservices server 854. The download request may be a request redirectedfrom core system 102 (FIG. 1) to a storage delivery node 112, forexample. A transfer services server 854 can be similar to server 707 ofFIG. 7A and be located within the storage delivery node 112. Thetransfer service server 854 may then communicate with the core system102 (FIG. 1) for the purpose of authenticating the end user 114associated with the download request (inquiry task 856). If the end user114 is not authenticated (“No” branch of the inquiry task 856), then therequest is terminated.

If the user is authenticated (“Yes” branch of the inquiry task 856),then the core system 102 determines the identity of a storage nodecontaining the requested file and returns a physical path for that nodeto the requester's computer (task 858). In one embodiment, if multiplenodes are identified as containing the requested file, the core system102 selects the node that is closest and/or least busy, or makes itsnode selection based on some combination of these factors. The physicalpath for the selected node is correlated with an LFID associated withthe user's virtual file path for the requested download file. The localnode manager database server 708 at the selected node (FIG. 7A) mayfurther determine the physical location of the file within the nodegiven the LFID (task 858) using Physical File Table 600 (FIG. 6). Oncethe physical location of the file is determined, the node managerdatabase server 708 then requests the file from the proper storageserver 862 (task 860). In one embodiment, the proper storage server isthe least busy storage server in the node that contains the requestedfile. The transfer service server 854 then receives the data packets ofthe file from the proper storage server 862 and thereafter transmits thefile to the requester (task 855). In one embodiment, the file istransferred from the transfer server 854 to the user via a HTTP proxydownload program (task 868). The transfer service server 854 may thennotify the IMFS 108 of the number of bytes transferred to the user foraccounting purposes (task 870).

FIG. 8D illustrates an exemplary global usage based file locationmanipulation process 880 in accordance with one embodiment of theinvention. The various tasks performed in connection with process 880may be implemented by software, hardware, firmware, a computer-readablemedium storing computer executable instructions for performing theprocess, or any combination thereof. It should be appreciated thatprocess 880 may include any number of additional or alternative tasks.The tasks shown in FIG. 8D need not be performed in the illustratedorder, and process 880 may be incorporated into a more comprehensiveprocedure or process having additional functionality not described indetail herein. For illustrative purposes, the following description ofprocess 880 may refer to elements mentioned above in connection withFIGS. 1-7A. In various embodiments, portions of process 880 may beperformed by different elements of systems 100-700, such as core system102, customer application interface 110, and the distributed storagedelivery nodes 112A, 112B-112K. Tasks of process 880 may be performed asbackend processes that are transparent to the end user 114.

Process 880 may begin by receiving a download request at step 882. Thedownload request can be sent from end user 114 and received by coresystem 102, for example.

The core system 102 then identifies the nearest node containing therequested file in step 884. For example, the core system 102 candetermine an LFID associated with the file download request and identifywhich nodes contain files associated with the LFID using the LogicalNode Table described with reference to FIG. 5B. Since copies of a filecan be stored in a plurality of nodes, a plurality of nodes may beidentified in step 884. As explained in further detail below, in oneembodiment, when a plurality of nodes contain copies of the file, acomparison, such as a geocode comparison between the user's geocode andeach identified node's geocode may be used to determine which of thosenodes is the nearest node or a “near enough” node. Alternatively, a lookup table such as a Node Priority Table 1070, described in further detailbelow, can be accessed to determine which nodes can serve the user basedon his or her geocode. Once the available nodes are identified the coresystem 102 can determine which of those nodes contains the requestedfile and thereafter redirect the user's request to the highest rankednode for that user's geocode as specified in the Node Priority Table1070.

The core system 102 can then determine whether the nearest node is a“near enough” node at decision step 886. Just because a node isdetermined to be nearest to the user in step 884, does not necessarilymean that the node is “near enough.” As used herein a “near enough node”can refer to a node that is deemed to be sufficient to process a usersrequest based on various criteria. The criteria can be strictly adistance between the user and a node or can also include additional oralternative factors, such as quality of service a node can provide tothe user. The criteria used to determine whether a node is “near enough”can also be specified by an SLA governing the user's request.

If the nearest node is determined to be “near enough”, then the coreredirects the download requests and all subsequent requests from theuser to that node at step 888. Thus, a subsequent request from the userneed no longer pass through the core system 102, but instead candirectly access the file from the node. In one embodiment, the customerapplication interface stores the initial download request details, andsubsequent requests for the same file by the same IP address areredirected to the previously identified “near enough” storage node.

If none of the nodes containing the file qualify as a “near enough”storage node, then the core system 102 temporarily redirects the user tothe nearest node (also referred to as “first node” in this example ofFIG. 8D) containing the file at step 890. In other words, the first nodeserves the download request for the user, but subsequent requests may bedirected to a different node.

Next, the core system 102 determines the identity of a “near enough”node at step 892, and instructs the “near enough” node to get a copy ofthe requested file from the first node at step 894. Accordingly, afterstep 894, both the first node and the “near enough” node have a copy ofthe requested file. The core system can then notify the customerapplication interface of the new “near enough” node's IP address so thatsubsequent requests for the same file by the same user IP address aredirected automatically to the new node identified at step 892.

In a further embodiment, at decision step 894, the core system 102, or aclean up program located at the node, can periodically compare a timesince the requested file had been last accessed (LAD) at the “nearenough” node with a predetermined threshold. The predetermined thresholdcan correspond to a period of time, e.g., 10 days. If the LAD exceedsthe threshold, then the file at the “near enough” node is deleted instep 896. If the LAD does not exceed the threshold, then the “nearenough” node is designated as the primary storage node at step 897 andthe copy of the file on the first storage node is deleted at step 898.In this manner, process 880 can move files to nodes which better serveusers. Moreover, duplication of files can be reduced by deleting copiesof files that are not frequently accessed.

In accordance with various embodiments, a node or other network resourceis “near enough” by determining a physical location associated with auser computer by translating its IP address into a geocode and,thereafter, comparing this geocode with a geocode associated with one ormore nodes or other network resources. One or more nodes or networkresources (e.g., servers) are then assigned to service the user'srequest (e.g., an upload or download request) based at least in part onthe location of the network resource relative to the location of theuser's computer as determined by respective geocodes associated with theuser's computer and the network resource.

Geocodes are known in the art and used, for example, by the U.S. postalservice to assign codes to geographic regions or areas. In general, ageocode is a code that represents a geospatial coordinate measurement ofa geographic location and time. A geocode representation can be derived,for example, from the following geospatial attributes: latitude,longitude, altitude, date, local time, global time and other criteria,such as, how the area is coded (e.g., number, letter, mixture of both,or other), which part of the earth is covered (e.g., whole earth, land,water, a continent, a country, etc.), what kind of area or location isbeing coded (e.g., country, county, airport, etc.), and/or whether anarea or point is being coded. Generally, a geocode is a numberrepresentation that takes into account some or all of the abovecriteria.

Every computer or device that communicates over the Internet has aunique Internet Protocol (IP) address assigned to it. Computers anddevices residing within a pre-determined geographic region or area aretypically assigned a specified range of IP addresses. For example, allcomputers within Japan may have IP addresses in the range of43.0.0.0-43.255.255.255 (Source: IANA, Japan Inet, Japan (NET-JAPAN-A).

In one embodiment, when a user or customer makes an upload (a.k.a.,“put” or “store”) or download (a.k.a., “get” or “retrieve”) request, viaa web services interface, for example, the request is received by coresystem 102 which translates the IP address associated with the incomingrequest into a geocode. The core server 102 looks up a table thatcorrelates IP addresses with geocodes, or IP address ranges with geocoderanges. After the IP address has been translated into a geocode, thesystem compares the geocode to the geocodes that have been assigned tostorage nodes within the network and determines, algorithmically, whichresources are “nearest” the requestor. If only one resource is “nearenough,” the user is redirected to that resource. If multiple resourcesare “near enough,” the system may determine which of the resources iscurrently experiencing the lightest volume of requests (e.g., viaupdatable polling) and redirect the requester to that resource. Or, inan alternative implementation, the requester may be directed to theabsolute nearest resource, regardless of the current volume of requestsbeing handled by that nearest resource.

FIG. 9 illustrates an exemplary IP address to Geocode translation table900, in accordance with one embodiment of the invention. A periodicallyupdated copy of this table 900 may be stored at the core system 102 andat each of the distributed storage delivery nodes 112 within the SDNsystem 100. As previously discussed above, IP addresses of a group ofcomputers within a particular geographic region or area are typicallyassigned IP addresses within a range of addresses. FIG. 9 shows somefictional IP addresses 902 and geocodes 904. Generally, IP addresses 902may include four numerical values separated by a period, similar to thatshown in FIG. 9. For example, IP addresses within San Diego county maybe assigned an IP address of 192.168.1.X, where X differentiatesindividual IP addresses within the county. The correlation between IPaddresses and geographic areas and regions can be obtained from publiclyavailable sources. For example, third party vendors such as IPLigencemay provide such information for a fee. After the IP addresses 902 havebeen correlated to corresponding geographic areas, this information canthen be used to map IP addresses to geocodes 904 based on the correlatedgeographic information. As previously mentioned, geocodes 904 are knowntypes of codes used by the postal service, for example, to codegeographic areas and regions to indicate relative distances andpositions between the geographic areas.

In one embodiment, a geocode may comprise at least five numerical fieldsa-e. As shown in FIG. 9, a first field (a) may indicate a continent(e.g., “7”=Asia), a second field (b) may indicate a country, a thirdfield (c) may indicate a state or region, a fourth field (d) mayindicate a city and a fifth field (e) may indicate a postal code, forexample. The values of the geocodes are such that a large differencebetween two geocodes indicates a large distance between the respectivegeographic regions corresponding to the geocodes. For example, if twogeocodes differ in value in the first field of a geocode, then it isknown that the corresponding geographic areas are on differentcontinents and quite far from each other. Thus, by storing a geocode foreach IP address associated with all users and network resources,relative distances between user devices and network resources can becalculated by calculating the absolute value of the difference betweenrespective geocodes. It is understood that the geocode shown in FIG. 9is exemplary and other formats and fields may be implemented inaccordance with desired criteria and/or applications.

In one embodiment, the core system 102 may determine distances betweenstorage nodes and a user's device, or whether the storage node is “nearenough” to the user device, by calculating the absolute value of thedifference between the storage node's geocode and the user's geocode. Inone embodiment, a storage node is determined to be “near enough” if anabsolute value of its corresponding distance is lower than apredetermined threshold value. In further embodiments, additionalcriteria may be considered to determine whether a node is “near enough,”or should be selected to service the user's file request. Suchadditional factors may include, for example, how busy the node is, asmeasured by the number of current accesses to the storage node, ornumber of accesses to a file within a specified time period by a user,bandwidth of the network, speeds of the communication links on thenetwork, quality of service (QoS) of communications on the network,policies and rules as determined by a user's or customer's SLA, masterinternet trunk information, relative connectivity of the storage nodeswithin the network, the relative performance capabilities of the node ascompared to other nodes, etc. In various embodiments, variouscombinations of the above factors may be utilized and considered bylogic residing in the core system 102 and/or logic within nodes todetermine which one of a plurality of nodes should handle the user'srequest and subsequent requests by the same user.

In an alternative embodiment, the relative distances between nodes andvarious geographic regions can be used to create a Node Priority Tablethat prioritizes which nodes have priority with respect to serving endusers in each geographic region. In this embodiment, to determinewhether a node is “near enough,” the core system need not perform anygeocode subtractions but simply looks up the Node Priority Table todetermine which nodes are designated to serve a particular user requestbased on a geocode value associated with the user request. A moredetailed discussion of a Node Priority Table is provided below withreference to FIG. 10B, in accordance with one embodiment of theinvention.

Determining a node to serve a client request will now be described withreference to FIGS. 10A-10D in accordance with various embodiments of thepresent invention.

FIG. 10A illustrates storage nodes A and B located at separategeographic locations. For example, storage node A may be located inCalifornia while storage node B is located in New York. Geocodes 1-6 areassigned to predetermined geographic regions defined by circularboundaries having predetermined radii centered about each node. Firstand second circular boundaries surrounding node A are defined by circles1002 and 1004, respectively. Third and fourth circular boundariessurrounding node B are defined by circles 1006 and 1008, respectively.In one embodiment, the boundaries having the smaller radii 1002 and 1006represents areas that can be considered “closest” to a respective node,and the boundaries having the larger radii 1004 and 1008 can beconsidered “close enough” to a respective node. Although FIG. 10Aillustrates regions defined by circular boundaries, it is appreciatedthat various shaped boundaries can be used to define geocode regions,such as rectangular shapes. Moreover, geocode regions need not even bedefined by particular shapes, but may be defined by other criteria, suchas quality of service considerations, latency times, etc. As shown inFIG. 10A, the circles 1002, 1004, 1006 and 1008 define variousgeographic regions 1-6 with respect to the nodes A and B which may betranslated or correlated to geocodes, or geocode regions, in accordancewith one embodiment of the invention. A first geocode region 1corresponds to an area of intersection between circles 1002 and 1004. Asecond geocode region 2 corresponds to the area within circle 1002 minusregion 1. A third geocode region 3 corresponds to the area within circle1004 minus region 1. Similarly, a fourth geocode region 4 corresponds toan area of intersection between circles 1004 and 1008. A fifth geocoderegion 5 corresponds to an area within circle 1004 minus regions 2 and 4and a sixth geocode region 6 corresponds to an area within circle 1008minus regions 3 and 4.

FIG. 10B illustrates a Node Priority Table 1070 associated with thegeographic regions of FIG. 10A, in accordance with one exemplaryembodiment of the invention. The Node Priority Table 1070 identifies apriority order for a plurality of nodes to which core system 102 maysend user requests based on which geocode region (e.g., 1-6) a user iscalling from. The Node Priority Table 1070 includes a Geocode ID column1072, a Priority ID column 1074, and a Node ID column 1076. The GeocodeID column 1072 is populated by the geocode region IDs (e.g., 1-6) ofFIG. 10A. The Priority ID column 1074 is populated by values indicatinga node access priority associated with each node in each geocode region.The Node ID column 1076 is populated by values identifying a storagenode, e.g., A or B which has been designated to service various geocoderegions in accordance with a predetermined priority order of selection.Based upon which geocode region an end user 114 is calling from, aparticular node can be determined to be a “near enough” or a closestnode to the end user 114 using the Node Priority Table 1070. In oneembodiment, the Node Priority Table 1070 is stored in the core databaseserver 102 which uses the table to select one or more available nodes towhich a user's file request is redirected.

By prioritizing nodes with respect to different geographic regionsvarious algorithms may be implemented to select particular nodes toservice user requests originating from various geographic regions. Inthis example, geographic proximity is a primary factor in determiningnode selection for a particular user request. However, as would beapparent to those of skill in the art, various additional factors suchas server latencies, server performance, quality of service, how busyone node is when compared to another node, etc. may be taken intoaccount and implemented in the node priority table and/or algorithms forselecting nodes to service user requests. In the present example geocoderegions shown in FIG. 10A, a geocode ID region that falls within a“closest” radius from a node may be assigned a “1” priority with respectto that node (i.e., a highest priority value). Moreover, a geocode IDregion that is outside the “closest” radius, but falls within the “closeenough” radius of a node may be assigned a priority “2” region withrespect to that node. Regions falling outside of the “near enough”radius may be assigned a priority “3” region.

Thus, as shown in FIG. 10B, geocode ID regions 1 and 2 are considered“closest” to node A. Accordingly, Node Priority Table 1070 has priority“1” values assigned under the Priority ID column 1074 associated withnode A in geocode ID regions 1 and 2. Geocode ID regions 4 and 6 falloutside of the “closest” radius, but fall within the “near enoughradius.” Accordingly, Node Priority Table 1070 has priority “2” valuesassigned under the Priority ID column 1074 associated with node A ingeocode ID regions 4 and 6. Geocode ID region 5 falls outside of the“near enough” radius of node A, and therefore is assigned a priority “3”value under the Priority ID column 1074 associated with node A. Thepriority IDs are assigned in a similar fashion for node B. Note thatsome geocodes can have the same Priority ID values for both nodes. Insuch cases, the node selected to direct a request to can be determinedbased on various factors, such as which node is less busy or otherperformance-based factors.

It is understood that geocode regions may be defined in any desiredmanner to achieve desired performance goals. For example, geocoderegions may be defined by longitudinal boundaries in accordance with oneembodiment. FIG. 10C illustrates exemplary distributed storage nodes Aand B and K located at geographically separate locations around theworld. The world is divided into exemplary geocode ID regions 1-6 basedon longitudinal boundaries indicated by dashed lines. In the embodimentof FIG. 10C, a geographic area is divided based upon longitudinalboundaries, but it is appreciated that the geographic areas can bedivided using zip codes, country codes, and the like. A Node PriorityTable can then have a priority ID value for each geocode region 1-6assigned to some or all of the nodes A, B and K. The priority value canbe based on various criteria, including distance from a node to thegeocode region and connectivity performance between the geocode regionand the node, for example.

FIG. 10D illustrates an exemplary node selection process 1080. Thevarious tasks performed in connection with process 1080 may beimplemented by software, hardware, firmware, a computer-readable mediumstoring computer executable instructions for performing the processmethod, or any combination thereof. It should be appreciated thatprocess 1080 may include any number of additional or alternative tasks.The tasks shown in FIG. 10D need not be performed in the illustratedorder, and process 1080 may be incorporated into a more comprehensiveprocedure or process having additional functionality not described indetail herein. For illustrative purposes, the following description ofprocess 1080 may refer to elements mentioned above in connection withFIGS. 1-7A. In various embodiments, portions of process 1080 may beperformed by different elements of systems 100-700, such as core system102, customer application interface 110, and the distributed storagedelivery nodes 112A, 112B-112K. Tasks of process 1080 may be performedas backend processes that are transparent to the end user 114

For illustrative purposes the following discussion describes a userdownload request. It is appreciated that process 1080 may be equallyapplicable to a file upload request with minor modifications. At step1081, a user request to download a file is received by the core system102. In one embodiment, the user request includes an IP address of theuser's device and a virtual path name of the file being requested.

Next, at step 1082, the core system 102 determines available nodes thatcontain the requested file. This step is performed by correlating thevirtual path name with a LFID as described above with reference to FIGS.4A-5A. The LFID can then be used to identify which nodes contain thefile and which of those nodes are available (e.g., online) using theLogical Node Table of FIG. 5B.

The core system 102 then determines a priority of the available nodesthat contain the file in step 1083. This is done correlating theavailable nodes that contain the file with the Node Priority list 1070(FIG. 10B) and the geocode ID associated with the region from which theuser is calling. The available node that contains the file having thelowest Node Priority ID value is determined to be the highest prioritynode. Thus, a node having a priority ID value of “1” is determined to bea top priority node, etc.

In step 1084, the user is redirected to the available node that containsthe file and is assigned the highest node priority ID. For the purposesof this example, this node can be referred to as the “first node”). Thefirst node then transmits the requested file to the user in step 1085.

Synchronously or asynchronously with transmitting the file to the userin step 1085, the first node determines if it is an appropriate node atdecision step 1086. In one embodiment, the first node determines if itis an appropriate node based on whether the users IP address or addressrange, which the first node obtained from the user, is on a serve listcontained in the first node. If the user's IP address is not on theserve list, then the first node is not an appropriate node. In otherembodiments, this determination need not be based on a user's IPaddress, but can instead be based on various criteria, including theuser's geocode.

If the first node is determined to be an appropriate node, then process1080 may end at step 1087.

If the first node determines that it is not an appropriate node, then itnotifies the core system 102 that it is not an appropriate node in step1088. The core system 102 then determines a “best node” to serve furtherdownload requests from that user in step 1089. The “best node” can bedetermined based on various criteria including policies set forth in acontrolling SLA. As an example, a controlling SLA may specify aparticular node, in which case that node would be considered the bestnode. As another example, the controlling SLA may specify that the bestnode is any node that can best serve the user if that node has a copy ofthe file. In various embodiments, the determination of which node canbest serve users can be based on, for example, usage patterns of thevarious nodes, geographic proximity of the various nodes to a user,latency measures, quality of service requirements for the user asspecified in the user's SLA, for example, etc.

Next, the core system 102 instructs the best node to get a copy of thefile in step 1090. Subsequent requests for the file can then be directedto the best node in step 1091. It is appreciated that one benefit of theabove process is that the node off-loads processing requirements fromthe core server 102 by determining whether it is an appropriate node toservice a user request (step 1086). As mentioned above, thisdetermination can be based on a variety of predetermined criteria (e.g.,whether the IP address of the user is on a “serve list,” latencyconsiderations, distance considerations, quality of service associatedwith the request, etc.). In most instances it is contemplated that theselected node will be an appropriate or acceptable node to process arequest and, therefore, the node will not need to bother the coreserver. Only in rare instances will the node notify the core that it isnot an appropriate or acceptable node to service a particular request.In this way, the core server 102 does not need to perform an inquiry forevery request that is transmitted to it concerning whether a selectednode is an appropriate or acceptable node. It simply, redirects arequest to a nearest available node containing the requested file andthereafter assumes the node will handle the request. The core server 102is only notified if there is a problem and thereafter takes appropriateaction.

An exemplary environment in which an inter-node balancing process may beimplemented is described with reference to FIG. 11A below, in accordancewith one embodiment of the invention. As shown in FIG. 11A, SDN system100 includes a core system 102 communicatively coupled to fourdistributed storage delivery nodes 112A, 112B, 112C and 112D. For thepurposes of this example, end user device 1102 is calling from alocation closest to storage delivery node 112A and a governing SLAdictates that files requested by the end user device 1102 be moved to astorage node located “closest” to the end user device 1102 at storagedelivery node 112A.

As used herein, the term “closest” does not necessarily mean the node isthe closest node in terms of absolute distance. The term can also beused to refer to a node that is better suited for connection with theend user because, for example, the connection between the user and thenode will result in better performance (e.g., higher data transmissionrate) versus another node. Furthermore, a “closest” node may, in fact,be further away than another node, yet still be determined to be a“closest” node due to design efficiencies, and/or relative performancecapabilities of the various nodes, and/or the relative load (e.g.,number of requests being handled) of the various nodes. Such designefficiencies and/or operation parameters may take into account the easeof managing which nodes users can access as opposed to requiring astrict absolute distance based analysis.

FIG. 11B illustrates a flowchart of an exemplary inter-node loadbalancing process 1120 that can be performed in the environment of FIG.11A in accordance with one embodiment of the invention. The varioustasks performed in connection with process 1120 may be implemented bysoftware, hardware, firmware, a computer-readable medium storingcomputer executable instructions for performing the process method, orany combination thereof. It should be appreciated that process 1120 mayinclude any number of additional or alternative tasks. The tasks shownin FIG. 11B need not be performed in the illustrated order, and process1120 may be incorporated into a more comprehensive procedure or processhaving additional functionality not described in detail herein. Forillustrative purposes, the following description of process 1120 mayrefer to elements mentioned above in connection with FIGS. 1-11A. Invarious embodiments, portions of process 1120 may be performed bydifferent elements of systems 100-1100, such as core system 102,customer application interface 110, and the distributed storage deliverynodes 112A, 112B-112K. Tasks of process 1150 may be performed as backendprocesses that are transparent to the end user 114.

Process 1120 may begin by an end user calling (via the end user device1102) into the core system 102 and requesting a file (task 1122). Therequest can comprise the end user's IP address and informationcorresponding to a virtual path of the requested file. The virtual pathname is described in more detail with reference to FIGS. 3 and 4. Thecore system 102 then translates the virtual path name to itscorresponding LFID using tables stored in VFS 105 and LFS 107 (FIG. 1).Thereafter, the core system 102 identifies all the storage nodes inwhich the file is stored using LFS 107 (task 1124). Next, the identifiedstorage nodes are prioritized by sorting the Node Priority Table 1070(FIG. 10B) and taking into account the user's SLA (task 1126). The corethen determines which of the sorted nodes are “near enough” (e.g.,priority 2 or better) (task 1128). Optionally, the core 102 determineswhether any of the identified “near enough” nodes have recently updatedits current access count (CAC), which is the number of requests a nodeis currently handling (task 1130). If the answer to inquiry 1130 is“no,” then the core 102 directs the user request to the nearest of thenear enough nodes (task 1132), after which process 1120 ends.

If the answer to inquiry 1130 is “yes,” then the core server 102determines whether the nearest of the near enough nodes is too busy(i.e., CAC over threshold?) (task 1134). It is appreciated that tasks1130 and 1132 are optionally implemented by the core in order topotentially bypass tasks 1134-1146, thereby saving processing bandwidthat the core 102, in accordance with one embodiment of the invention. Ifoptional tasks 1130 and 1132 are omitted, then inquiry task 1134immediately succeeds task 1128 in process 1120. If the answer to inquiry1134 is “no,” then the core 102 directs the user request to the nearestnode (task 1136) and process 1120 ends. If the answer to inquiry 1134 is“yes,” the core inquires whether any of the other “near enough” nodesare less busy (task 1138). If the answer to inquiry 1138 is “no,” thenthe core 102 directs the user request to the previously identifiednearest node (task 1140) where it is queued for handling. Next, the coredetermines if there is another near enough node to copy the file to(task 1142). If so, the core initiates a “file walking” process byinstructing the new “near enough” node to copy the file from one of thepreviously identified nodes containing the file (task 1144).

If the answer to inquiry 1138 is “yes,” the core 102 directs the userrequest to the “near enough” node with the lowest current access count(CAC) (task 1146). If there is only one “near enough” node containingthe file that is less busy than the nearest node, then the user requestis automatically directed to that “near enough” node.

FIG. 11C illustrates a supplemental inter-node balancing process 1150which is implemented by a storage node, in accordance with oneembodiment of the invention. Whenever the user's request is redirectedto a storage node (e.g., tasks 1132, 1136, 1140 or 1146 of FIG. 11B),the storage node will receive the user's request (task 1152) andthereafter serve the file to the end user (task 1154). After completingthe transfer of the file to the user, the node decrements its currentaccess count (CAC) by 1 (task 1156) and then determines whether its CAChas crossed a threshold indicating that the node is no longer “too busy”(inquiry task 1158). If the answer to inquiry 1158 is “no,” then therehas not been a change of status of the node and the process 1150 ends.If the answer to inquiry 1158 is “yes,” this means that the node waspreviously “too busy” but is no longer “too busy.” Therefore, the nodenotifies the core 102 that it is no longer “too busy” by updating thecore 102 with its node CAC value (task 1166).

Immediately upon receiving a request from a user, the node incrementsits CAC by 1 (task 1160). Next, concurrently with processing the userrequest, the node determines whether its CAC value is above apredetermined threshold value (e.g., 100 requests) (task 1162). If theanswer to inquiry 1162 is “no,” then the node is not “too busy” and thenode need not notify the core. If the answer to inquiry 1162 is “yes,”then the node determines whether the core 102 was previously notified ofits “too busy” status within a predetermined duration of time T (task1164). If the answer to inquiry 1164 is “yes,” then the core 102 alreadyknows of the current “too busy” status of the node and no furthernotification is needed. If the answer to inquiry 1164 is “no,” then thenode notifies the core 102 that it is “too busy” by updating the core102 with its CAC value (task 1166). Thus, in this embodiment, the nodenotifies the core when its status changes from “too busy” to “not toobusy” and further notifies the core if its status is “too busy” and thecore has not been alerted of its “too busy” status within apredetermined time period.

FIG. 11D is an exemplary intra-node load combined with an inter-nodeload balancing process 1170 in accordance with one embodiment of thepresent invention. The various tasks performed in connection withprocess 1170 may be implemented by software, hardware, firmware, acomputer-readable medium storing computer executable instructions forperforming the process method, or any combination thereof. It should beappreciated that process 1170 may include any number of additional oralternative tasks. The tasks shown in FIG. 11C need not be performed inthe illustrated order, and process 1170 may be incorporated into a morecomprehensive procedure or process having additional functionality notdescribed in detail herein. For illustrative purposes, the followingdescription of process 1100B may refer to elements mentioned above inconnection with FIGS. 1-7A. In various embodiments, portions of process1170 may be performed by different elements of systems 100-700, e.g.,core system 102, the customer application interface 110, the distributedstorage delivery nodes 112, etc.

Upon receiving a download request (task 1172) for a file, a downloadserver 706 (FIG. 7) at a designated storage node may determine which ofa plurality of storage servers within the node 112 is least busy (task1174). Least busy may be measured by, for example, the lowest number ofcurrent accesses or accesses within a predetermined period of time. Theserver with the lowest number of accesses may then be used to serve thedownload request (task 1178). In one embodiment, immediately uponreceiving the transfer request, the identified server's CAC isincremented by one (task 1176) to indicate it is currently handling anadditional access request. After the server completes serving therequest, its CAC is decremented by one. Each time the CAC is incrementedor decremented the node stores a last update date (LUD) time stamp forthat LFID's CAC to determine when the CAC was last changed and, hence,the number of access requests within a predetermined period of time. Inthis manner, the storage node performs “intra-node load balancing” amongthe plurality of storage servers within the storage node 700 bydirecting a request to a “least busy” storage server in the storage node700. In other words, the number of simultaneous requests handled by anode is evenly distributed amongst a plurality of storage servers withinthe node such that no one server works harder on average than anotherserver. It is appreciated, that this type of intra-node load balancingreduces service latencies and optimizes node performance and serverlongevity.

Next, the node determines whether it is “near enough” or “local” to theend user's device by comparing a geocode value associated with theuser's IP address to its own geocode or, alternatively, by simplydetermining whether the user IP address is listed on its “serve list,”as described above (inquiry task 1182). If the first storage node isdetermined to be “near enough” (“Yes” branch of inquiry task 1182), thenthe first storage node compares the number of requests it is handling toa predetermined threshold (inquiry task 1184). If the number of requestsexceeds the threshold (“No” branch of inquiry task 1184), then the firststorage node is determined to be too busy and the file is copied to asecond storage node (task 1186). In one embodiment, the node notifiesthe core 102 that it is too busy, as described above, and the corethereafter instructs a second node to copy the file from the originalnode. Alternatively, in another embodiment, the original node canautomatically identify a new node that is near enough the user andinstruct the new node to copy the requested file. It is appreciated,however, that this latter embodiment requires more information and logicto be stored at the node. Subsequent requests for the file can then bedirected to the second storage node to offset some of the load of thefirst storage node. Thus, storage nodes within the network can performinter-node load balancing as well. If the number of requests does notexceed the threshold (“Yes” branch of inquiry task 1178), then the firststorage node continues to process further file requests for that file.

Referring back to inquiry task 1182, if the storage node determines thatit is not a proper node to serve the requester (“No” branch of inquirytask 1182), then the storage node notifies the core system 102. The coresystem 102 then determines the nearest storage node that contains therequested file based on the IP address of the end user (task 1188). Adistance between the end user and the node containing the file iscompared with a predetermined threshold in decision task 1190. If thethreshold is not exceeded (“Yes” branch of decision task 1190), then thenode identified in task 1188 process the request and process 1170 ends.Alternatively, the original node processes the current request but allsubsequent requests for the same file by the same user or user similarlylocated as that user are processed by the new node. If the threshold isexceeded (“No” branch of decision task 1190), then the core system 102directs the original storage node to send the file to the neareststorage node identified in task 1188 for storage (task 1192). The newnearest storage node containing the file may then process the file andnotify the LFS 107 of the file's existence at the node.

When a file is stored at two or more storage servers within a node, itmay be desirable to delete the file at one or more of the storageservers for de-duplication purposes. In one embodiment, a cleanupprogram determines if it is no longer necessary to store one or moreredundant files within a node based on a current access count (CAC)associated with the LFID for the file. FIG. 12 is a flow chart of anexemplary cleanup process 1200 associated intra-node load balancing, inaccordance with one embodiment of the present invention. The varioustasks performed in connection with process 1200 may be implemented bysoftware, hardware, firmware, a computer-readable medium storingcomputer executable instructions for performing the process method, orany combination thereof. It should be appreciated that process 1200 mayinclude any number of additional or alternative tasks. The tasks shownin FIG. 12 need not be performed in the illustrated order, and theseprocesses may be incorporated into a more comprehensive procedure orprocess having additional functionality not described in detail herein.For illustrative purposes, the following description of process 1200 mayrefer to elements mentioned above in connection with FIGS. 1-9. Invarious embodiments, portions of process 1200 may be performed bydifferent elements of systems 100-700, e.g., core system 102, thecustomer application interface 110, the distributed storage deliverynodes 112, etc.

Process 1200 may begin by counting a total current access count (CAC)for an LFID associated with a file (task 1210). In this context,“current access count” refers to a number of times a logical file iscurrently being accessed. If the total current access count of the LFIDdivided by the number of physical files currently associated with theLFID is not lower than a predetermined threshold (“No” branch of inquirytask 1212), then process 1200 ends and no files are deleted If the totalcurrent access count of the LFID divided by the number of physical filescurrently associated with the LFID is lower than the threshold (“Yes”branch of inquiry task 1212), process 1200 determines how many filesshould be cleaned up by subtracting the rounded-down quotient of thethreshold divided by the CAC from the current physical file count (task1214). For example, if the total CAC is equal to 10, and the thresholdis equal to 12, and the number of physical files associated with theLFID is equal 2, then the number of files to be deleted is equal to2−(rounded down quotient of 12/10)=1. Thus, in this example, the numberof redundant files to be deleted is equal to 1. Next, the correspondingnumber of physical files, which have the lowest CAC associated withthem, are marked offline so no new connections are made to those filesand existing connections are closed after the current transfers arecompleted (task 1216). A separate process determines which files aremarked offline and are no longer being accessed (i.e., CAC is equal tozero) (task 1218) and deletes those files from the disk (task 1220).

In a further embodiment the minimum number of physical files may be setto a number greater than one. In this scenario, the formula for task1214 may be modified to ensure that a specific number of physical filesare always maintained for each logical file. As would be understood byone of ordinary skill in the art, any threshold value may be selectedbased on the operating and/or performance capabilities of the storageservers within the node, to control the loads on each server. In thismanner, the number of redundant copies of a file stored in a node iscontinuously and automatically adjusted based at least in part on thenumber of access requests for that file and the operating/performancecapabilities of the servers within the node.

Although the internet media file system is described in the context ofstoring, accessing and manipulating files via the internet, it isunderstood that the invention is applicable within any type ofcommunications network (e.g., LAN, WAN, etc.). However, for illustrativepurposes, the data file system and method of the present invention isdescribed as an internet media file system (IMFS). The IMFS 108 can havea variety of functions and uses. Some exemplary uses are discussedbelow.

As an example, although the IMFS 108 has been described above inconnection with SDN storage nodes 112, it is understood that the IMFS108 may be used with various types of physical storage devices having avariety of storage network configurations. Thus, the IMFS 108 need notbe used exclusively with distributed storage delivery nodes 112, but canbe used with other types of memory devices as well.

The IMFS 108 is a file system that can enable users to store, retrieve,and manipulate files from a remote location using a rich set of WebService API's. File system operations require a caller (i.e., arequester such one of the end users 114) to be authenticated. Forexample, calls into IMFS 108 may require a session token which can beobtained by a logical call. In general, paths can be specified as eitherabsolute or relative to an account's root folder.

The following are some exemplary Web Service interfaces for IMFS.

-   -   A CopyFiles function or function is used to copy a file from one        location to another. The CopyFiles function can be used to copy        one or more files to a given folder.    -   A CopyFolders function is used to copy a folder from one        location to another. The CopyFolders function can be used to        copy one or more folders.    -   A CreateFolders function is used to create a new folder at the        specified location.    -   A DeleteFiles function is used to remove one or more files.    -   A DeleteFolders function is used to remove one or more folders.    -   A ListFolder function is used to page the content of a given        folder.    -   A MoveFiles function is used to move a file from one location to        another. The MoveFiles function can be used to move one or more        files to a given folder.    -   A MoveFolders function is used to move a folder from one        location to another. The MoveFolders function can be used to        move one or more folders.    -   A RenameFile function is used to rename a file from one name to        another.    -   A RenameFolder function is used to rename a folder from one name        to another.

In one embodiment, the IMFS 108 can correlate the physical files withtheir corresponding customers. Thus, the IMFS 108 can keep track of whatcontent is stored in the distributed storage delivery nodes 112, whereit is stored in the distributed storage delivery nodes 112, and who hasaccess to the content. The IMFS 108 may map the customer to a IMFS WebServices in order to keep track of a customer's file and provide accessfor the customer and/or customer's clients.

FIG. 13 illustrates an exemplary IMFS data flow 1300 in accordance withone embodiment of the invention.

As shown in FIG. 13 an end user 1302 can make a request to the IMFS WebServices 1304 to access the IMFS. IMFS Web Services 1304 may provide aset of API's that can allow an end user 1302 to upload files to theirIMFS and for manipulating the metadata An exemplary method of providingthe API's is using the SOAP protocol, however an HTTP upload interfacewill also be provided. The metadata types may include, withoutlimitation, image files, width, height, video file, duration, bit rate,frame rate, audio files, title, artist, album, genre, track, and thelike. The IMFS can have many function to manipulate metadata, including,without limitation:

-   -   A DeleteAllMetadata function for removing all metadata from a        file.    -   A DeleteMetadata function for removing specified metadata from a        file.    -   A GetMetadata function for retrieving all metadata from a file.    -   A SetMetadata function for setting specified metadata for a        file.    -   A DeleteAllTags function for removing all tags from a file.    -   A DeleteTags function for removing specified tags from a file.    -   A GetTags function for retrieving all tags from a file.    -   A SetTags function for setting specified tags for a file.

IMFS Web Services 1304 may include interfaces to the IMFS to allow endusers 1302 to, for example, upload, append, copy, delete, move, andrename files and folders. In one embodiment, the IMFS Web Services 1301may implement the industry standard REST and SOAP protocols forimplementing the APIs to the functions. The interfaces to the IMFS mayinclude, without limitation,

-   -   A CopyFiles function used to copy a file from one location to        another. The CopyFiles function can be used to copy one or more        files to a given folder.    -   A CopyFolders function used to copy a folder from one location        to another. The CopyFolders function can be used to copy one or        more folders.    -   A CreateFolders function used to create a new folder at the        specified location.    -   A DeleteFiles function used to remove one or more files.    -   A DeleteFolders function used to remove one or more folders.    -   A ListFolder function used to page the content of a given        folder.    -   A MoveFiles function used to move a file from one location to        another. The MoveFiles function can be used to move one or more        files to a given folder.    -   A MoveFolders function used to move a folder from one location        to another. The MoveFolders function can be used to move one or        more folders.    -   A RenameFile function used to rename a file from one name to        another.    -   A RenameFolder function used to rename a folder from one name to        another.        Furthermore, end users 1302 can retrieve a listing of their        files and also associate user defined tags and metadata.

With further reference to FIG. 13, the IMFS Web Services 1304 maycommunicate with an API database 1308 to obtain the IMFS Web ServicesAPI's. After a device used by the end user 1302 receives an IMFS WebServices API, the device may use the API to access files through thecommand processing servers 214 (FIG. 2). Unless the end user 1302 isrequesting to append or upload a file, the IMFS Web Services 1304returns the IMFS Web Services API to the end user 1302 as a response tothe request.

If the end user 1302 requests to upload or append a file with, forexample an “Upload File” command, then the IMFS Web Services 1304 writesportions (e.g., bytes) of the user's file to permanent storage 1310. TheIMFS Web Services 1304 may then submit the “Upload File” command to themessage queuing service 1312 (as explained in more detail below), andreturn a response to the end user 1302 with the status of the command.

The “Upload File” command may be used to upload a file in its entirety.If the path does not exist it can be created. The maximum file size foruploading a file using this command may be, for example, about 2 GB. Ifthe file is larger than about 2 GB, then the append file method may beused. For example, if the filename is“Vacations/2007/Hawaii/beachDay1.jpg”, then when the file is doneuploading, the file would be added to the file system asVacations/2007/Hawaii/beachDay1.jpg”. The IMFS Web Services 1304 maycreate the folders that do not exist in this scenario using standardoperating system file operations. The “Append File” command can be usedto add data to the uploaded file in parts.

When an end user 1302 uploads a file using the API's append file methodand upload file method, then there may be other actions that occurwithin the IMFS Web Services 1304. For example, as soon as the lastportion (i.e., last byte) of the file has been written to the permanentstorage 1310, the IMFS Web Services 1304 may interact with the database1308 and update the end user's file system. At that point, the end user1302 may complete access to their file. The end user 1302 can download,copy, move, delete, rename, and set tag and metadata information for thefile. The command processing service 1314 may process this file, andextract industry standard metadata from image, video, audio files, andthe like.

In one embodiment, the command processing service 1314 can be a WindowsService operable to be a scalable and extensible solution for executingsystem wide tasks for the IMFS Web Services 1304. In alternativeembodiments, the command processing service 1314 can be implemented asan operating system daemon operable to be a scalable and extensiblesolution for executing system wide tasks for the IMFS Web Services 1304.The service 1314 can function as a generic framework for computationsthat can be completed asynchronously.

In one embodiment, a web-based tool may allow the IMFS to get areal-time snapshot of all activity occurring on a given server runningthe command processing service 1314. This can be very beneficial fortroubleshooting purposes, and to have an overall view of the number offiles that are being uploaded over time.

One of the purposes of the command processing service 1314 is, forexample, to calculate the MD5 hash for the purpose of physical filede-duplication as explained above. It can also be responsible forextracting metadata from image, video, and audio files in order toprovide the end user 1302 with more information about their files.Examples of this type of metadata are image width and height, videoframe rate, the artist and album for an audio file, and the like.

The command processing service 1314 may function to run regularlyscheduled maintenance jobs for customers (end users) who have unreportedusage, clean up aborted upload files, and provide system resourceinformation such as available storage to the IMFS database 1308.

The command processing service 1314 may run on one or more serverslocated throughout various nodes. As processing requirements grow,processing servers can easily be added to assist in balancing the system100 load. All processing servers running the command processing service1314 may be independent from any other processing server (i.e., oneprocessing server may have has no idea that any other processing serverexists). Load balancing amongst storage node servers or between storagenodes may be automatic, as explained above.

The command processing service 1314 may wait for a command, and thenexecute it. When it is not executing a command, it may be idle. Themechanism by which the command processing service 1314 receives thesecommands is a queuing service such as queuing service 1312. In oneembodiment, the queuing service 1312 may comprise an MSMQ service. Thequeuing service 1312 may be configured in a clustered set of nodes inthe node with complete failover capability. Therefore, if one of thequeuing service cluster nodes happened to fail, it would automaticallyfail-over to another storage delivery node without any data loss. Thequeuing service 1312 service may also be configured to have datarecovery if for some reason the queuing service 1312 service needs to bestopped and/or restarted. All data currently stored in the queue isautomatically serialized to disk.

As mentioned above, a command may be sent to the queuing service 1312from the IMFS Web Services 1306 when the end user 1302 uploads a file,as will be explained below. Once a command arrives at the queuingservice 1312, it can automatically be retrieved from one commandprocessing service 1314 that is available to receive that command forprocessing. In one embodiment, commands are asynchronously “pulled” froma command processing service 1314 not “pushed” to a command processingservice 1314. Once a command is retrieved, it can automatically beremoved from the queuing service 1312. Commands sent to the queuingservice 1312 may have a priority associated with them. In other words, acommand may be submitted to the queuing service 1312 and be moved ‘tothe head of line’ so that it is received ahead of other commands alreadyin the queuing service 1312. The command processing service 1314 may beoperable to take full advantage of this feature.

Each command processing service 1314 can, for example, be initializedwith about 10 processing threads on a given server. Therefore, eachprocessing server can process about 10 commands simultaneously and eachcommand is executed totally asynchronous from any other command. Thenumber of processing threads is configurable. Once a processing threadhas completed executing the command, it waits to receive another commandfrom the queuing service 1312. The threads are either executing acommand or waiting to receive another command until the service isterminated.

Under optimal conditions, commands submitted to queuing service 1312 aretaken off the queue to be processed immediately. However, under heavyload conditions, the processing servers may not be able to process allcommands as fast as they are being submitted. As a result, commands mayhave to wait in the queue longer than desired before getting processed.In this case, additional processing servers can be added to furtherdistribute the system load and reduce processing delays.

Standard commands may asynchronously be sent to the queuing service 1312and be asynchronously executed by a command processing service 1314. Thestandard commands may include, without limitation: a “BaseCommand”, a“Scheduler” command, a “Media” command, a “File Ingestion” command, a“Multi Node File Copy” command, a “Partial File Update” command, an “AddPhysical File” command, a “Get Upload Location” command, and the like.

A “Get Upload Location” command can be used to determine whichdistributed storage delivery nodes 112 a file may be uploaded to. The“Get Upload Location” may return an IP address for the distributedstorage delivery nodes 112 (FIG. 1) and an upload token.

It may be possible that a command submitted to processing service 1314fails to execute. One scenario would be network congestion. If a commandfails, the command processing service 1314 may resubmit this command tothe queuing service 1312, but into a special separate queue designed forholding failed commands. Failed commands may not in any way affect auser's ability to download or manipulate the files. It may mean that afile may not have an MD5 hash and its embedded metadata, if applicable,associated with it. Failed commands can be re-processed at aninformation technologist's discretion once the system/network problemhas been resolved.

FIG. 14 illustrates an exemplary download sequence 1400 that may beimplemented using IMFS core database in accordance with one embodiment.At step 1402, the client initiates download request to the download nodeto which it was redirected by the core 102 (FIG. 1). Next, at step 1404,a transfer services server 707 (FIG. 7) asks the IMFS core database toauthenticate the user and authorize the download given a session token,the file path, and the number of bytes being requested. If the requestmeets all restrictions placed on this account such as file size limit orbandwidth limit, a reservation will be made against the account for thenumber of download bytes requested. The database then returns the LFIDassociated with the user's virtual file path and a reservation ID forthe download at step 1406. Next, the transfer services server asks thelocal node manager database for the physical location of the file giventhe logical file ID at step 1408 and the physical location is providedin step 1410.

The transfer services server reads the file content from the physicallocation at step 1412 and the transfer services server streams thecontent to the client at step 1414. After the transfer services servercompleted serving the client's request, it commits the actual bytestransferred for the reservation ID to the IMFS database at step 1416.

FIG. 15 illustrates an exemplary relocated file download sequence 1500that may be implemented using IMFS core database in accordance with oneembodiment. At step 1502, a client sends a download request to a firstdownload node. A transfer services server receives this request and thenasks the IMFS Core DB to authenticate the user and authorize thedownload given a session token, the file path, and the number of bytesbeing requested at step 1504. The DB responds with an error indicatingthat the requested file is no longer available at the first node andwhat the current optimum download node is at step 1506. The transferservices server at the first download node then redirects the client toa new, second download node at step 1508, using the original requestedURL with the node address replaced. At a next step 1510, the clientinitiates the same request to a transfer services server within thesecond node. The remaining sequence of process 1500 can be similar tosteps 1404 through 1416 of FIG. 14.

While various embodiments of the invention have been described above, itshould be understood that they have been presented by way of exampleonly, and not by way of limitation. Likewise, the various diagrams maydepict an example architectural or other configuration for thedisclosure, which is done to aid in understanding the features andfunctionality that can be included in the disclosure. The disclosure isnot restricted to the illustrated example architectures orconfigurations, but can be implemented using a variety of alternativearchitectures and configurations. Additionally, although the disclosureis described above in terms of various exemplary embodiments andimplementations, it should be understood that the various features andfunctionality described in one or more of the individual embodiments arenot limited in their applicability to the particular embodiment withwhich they are described. They instead can, be applied, alone or in somecombination, to one or more of the other embodiments of the disclosure,whether or not such embodiments are described, and whether or not suchfeatures are presented as being a part of a described embodiment. Thusthe breadth and scope of the present disclosure should not be limited byany of the above-described exemplary embodiments.

In this document, the term “module” as used herein, refers to software,firmware, hardware, and any combination of these elements for performingthe associated functions described herein. Additionally, for purpose ofdiscussion, the various modules are described as discrete modules;however, as would be apparent to one of ordinary skill in the art, twoor more modules may be combined to form a single module that performsthe associated functions according embodiments of the invention.

In this document, the terms “computer program product”,“computer-readable medium”, and the like, may be used generally to referto media such as, memory storage devices, or storage unit. These, andother forms of computer-readable media, may be involved in storing oneor more instructions for use by processor to cause the processor toperform specified operations. Such instructions, generally referred toas “computer program code” (which may be grouped in the form of computerprograms or other groupings), which when executed, enable the computingsystem.

It will be appreciated that, for clarity purposes, the above descriptionhas described embodiments of the invention with reference to differentfunctional units and processors. However, it will be apparent that anysuitable distribution of functionality between different functionalunits, processors or domains may be used without detracting from theinvention. For example, functionality illustrated to be performed byseparate processors or controllers may be performed by the sameprocessor or controller. Hence, references to specific functional unitsare only to be seen as references to suitable means for providing thedescribed functionality, rather than indicative of a strict logical orphysical structure or organization.

Terms and phrases used in this document, and variations thereof, unlessotherwise expressly stated, should be construed as open ended as opposedto limiting. As examples of the foregoing: the term “including” shouldbe read as meaning “including, without limitation” or the like; the term“example” is used to provide exemplary instances of the item indiscussion, not an exhaustive or limiting list thereof; and adjectivessuch as “conventional,” “traditional,” “normal,” “standard,” “known”,and terms of similar meaning, should not be construed as limiting theitem described to a given time period, or to an item available as of agiven time. But instead these terms should be read to encompassconventional, traditional, normal, or standard technologies that may beavailable, known now, or at any time in the future. Likewise, a group ofitems linked with the conjunction “and” should not be read as requiringthat each and every one of those items be present in the grouping, butrather should be read as “and/or” unless expressly stated otherwise.Similarly, a group of items linked with the conjunction “or” should notbe read as requiring mutual exclusivity among that group, but rathershould also be read as “and/or” unless expressly stated otherwise.Furthermore, although items, elements or components of the disclosuremay be described or claimed in the singular, the plural is contemplatedto be within the scope thereof unless limitation to the singular isexplicitly stated. The presence of broadening words and phrases such as“one or more,” “at least,” “but not limited to”, or other like phrasesin some instances shall not be read to mean that the narrower case isintended or required in instances where such broadening phrases may beabsent.

1. A method for balancing loads on a plurality of geographicallydistributed storage nodes coupled to a communications network,comprising: receiving a request from a user device to download a datafile; identifying all storage nodes from a plurality of geographicallydistributed storage nodes containing the requested data file; selectinga first storage node containing the requested file to serve the request;and determining if the first storage node is too busy, wherein if thefirst storage node is determined not to be too busy, directing therequest to the first storage node, otherwise searching for a secondstorage node containing the requested data file that is not too busyand, if the second storage node is found, directing the request to thesecond storage node.
 2. The method of claim 1 wherein selecting thefirst storage node comprises identifying the first storage as a neareststorage node with respect to the user device.
 3. The method of claim 1wherein selecting the first storage node comprises identifying the firststorage as a near enough storage node with respect to the user devicebased on one or more predetermined criterion.
 4. The method of claim 1wherein determining if the first storage node is too busy is based on anumber of current access requests being handled by the first storagenode.
 5. The method of claim 1 further comprising determining which ifany of the storage nodes containing the requested data file are nearenough the user device based on one or more predetermined criterion. 6.The method of claim 5 wherein determining which if any of the storagenodes containing the requested data file are near enough the user devicecomprises translating an internet protocol (IP) address associated withthe user device into a geocode value and determining which of thestorage nodes containing the requested data file have been designated asnear enough to a geographic region corresponding to the geocode value.7. The method of claim 6 further comprising determining a priority orderfor the storage nodes containing the requested file to serve therequest, wherein the first storage node is determined to have a highestpriority among the storage nodes containing the requested file.
 8. Themethod of claim 7 wherein determining a priority order comprisesaccessing a node priority table that assigns priority values for each ofthe plurality of nodes with respect to a plurality of geocode valuesassociated with a plurality of different geographic regions.
 9. Themethod of claim 1 wherein if the first storage node is too busy andsecond storage node is not found, the method further comprising:directing the request to the first storage node; and identifying a thirdstorage node that is not too busy but does not contain the requestedfile; and directing the third storage node to obtain a copy therequested file from the first storage node so that subsequent requestsfor the data file by the user will be handled by the third storage node.10. The method of claim 1 further comprising: determining if any of thestorage nodes containing the requested data file have sent a messagewithin a predetermined period of time indicating that it is too busy,wherein if no message has been sent within the predetermined timeperiod, the request is directed to the first storage node and the stepof determining whether the first storage node is too busy is notperformed.
 11. The method of claim 10 wherein a storage node containingthe requested data file sends a message that it is too busy if the anumber of current access requests being handled by the storage nodeexceeds a predetermined threshold.
 12. A system for balancing loads on aplurality of geographically distributed storage nodes coupled to acommunications network, comprising: a web services interface operable toreceive a request from a user device to download a data file, whereinthe user device is associated with an internet protocol (IP) address; adatabase containing one or more data tables correlating a plurality ofIP addresses to a plurality of geocode values, each geocode valuecorresponding to a specified geographic region, and assigning at leastone of a plurality of geographically distributed storage nodes to serveuser devices associated with each geocode value, the one or more tablesfurther identifying one or more storage nodes wherein each of aplurality of data files are stored; and a server coupled to thedatabase, the server comprising: a first module for receiving a requestfrom a user device to download a data file; a second module foridentifying all storage nodes from a plurality of geographicallydistributed storage nodes containing the requested data file; a thirdmodule for selecting a first storage node containing the requested fileto serve the request; and a fourth module for determining if the firststorage node is too busy, wherein if the first storage node isdetermined not to be too busy, directing the request to the firststorage node, otherwise searching for a second storage node containingthe requested data file that is not too busy and, if the second storagenode is found, directing the request to the second storage node.
 13. Thesystem of claim 12 wherein the third module comprises a fifth module foridentifying the first storage node as a nearest storage node withrespect to the user device.
 14. The system of claim 12 wherein the thirdmodule comprises a fifth module for identifying the first storage nodeas a near enough storage node with respect to the user device based onone or more predetermined criterion.
 15. The system of claim 12 whereinthe fourth module comprises a fifth module for determining a number ofcurrent access requests being handled by the first storage node.
 16. Thesystem of claim 12 further comprising a fifth module for determiningwhich if any of the storage nodes containing the requested data file arenear enough the user device based on one or more predeterminedcriterion.
 17. The system of claim 16 wherein the fifth module comprisesa sixth module for translating the internet protocol (IP) addressassociated with the user device into a geocode value and determiningwhich of the storage nodes containing the requested data file have beendesignated as near enough to a geographic region corresponding to thegeocode value.
 18. The system of claim 17 further comprising a seventhmodule for determining a priority order for the storage nodes containingthe requested file to serve the request, wherein the first storage nodeis determined to have a highest priority among the storage nodescontaining the requested file.
 19. The system of claim 18 wherein theseventh module comprises an eighth module for accessing a node prioritytable that assigns priority values for each of the plurality of nodeswith respect to a plurality of geocode values associated with aplurality of different geographic regions.
 20. The system of claim 12further comprising: a fifth module for directing the request to thefirst storage node if the first storage node is too busy and if thesecond storage node is not found; and a sixth module for identifying athird storage node that is not too busy but does not contain therequested file; and a seventh module for directing the third storagenode to obtain a copy of the requested file from the first storage nodeso that subsequent requests for the data file by the user will behandled by the third storage node.
 21. The system of claim 12 furthercomprising: a fifth module for determining if any of the storage nodescontaining the requested data file have sent a message within apredetermined period of time indicating that it is too busy, wherein ifno message has been sent within the predetermined time period, therequest is directed to the first storage node and the step ofdetermining whether the first storage node is too busy is not performed.22. A computer readable medium storing computer executable instructionsthat when executed perform a process for balancing loads on a pluralityof geographically distributed storage nodes, the instructionscomprising: a first code module for receiving a request from a userdevice to download a data file; a second code module for identifying allstorage nodes from a plurality of geographically distributed storagenodes containing the requested data file; a third code module forselecting a first storage node containing the requested file to servethe request; and a fourth code module for determining if the firststorage node is too busy, wherein if the first storage node isdetermined not to be too busy, directing the request to the firststorage node, otherwise searching for a second storage node containingthe requested data file that is not too busy and, if the second storagenode is found, directing the request to the second storage node.
 23. Thecomputer readable medium of claim 22 wherein the third code modulecomprises a fifth code module for identifying the first storage node asa nearest storage node with respect to the user device.
 24. The computerreadable medium of claim 22 wherein the third code module comprises afifth code module for identifying the first storage node as a nearenough storage node with respect to the user device based on one or morepredetermined criterion.
 25. The computer readable medium of claim 22wherein the fourth code module comprises a fifth code module fordetermining a number of current access requests being handled by thefirst storage node.
 26. The computer readable medium of claim 22 furthercomprising a fifth code module for determining which if any of thestorage nodes containing the requested data file are near enough theuser device based on one or more predetermined criterion.
 27. Thecomputer readable medium of claim 26 wherein the fifth code modulecomprises a sixth code module for translating an internet protocol (IP)address associated with the user device into a geocode value anddetermining which of the storage nodes containing the requested datafile have been designated as near enough to a geographic regioncorresponding to the geocode value.
 28. The computer readable medium ofclaim 27 further comprising a seventh code module for determining apriority order for the storage nodes containing the requested file toserve the request, wherein the first storage node is determined to havea highest priority among the storage nodes containing the requestedfile.
 29. The computer readable medium of claim 28 wherein the seventhcode module comprises an eighth code module for accessing a nodepriority table that assigns priority values for each of the plurality ofnodes with respect to a plurality of geocode values associated with aplurality of different geographic regions.
 30. The computer readablemedium of claim 22 further comprising: a fifth code module for directingthe request to the first storage node if the first storage node is toobusy and if the second storage node is not found; and a sixth codemodule for identifying a third storage node that is not too busy butdoes not contain the requested file; and a seventh code module fordirecting the third storage node to obtain a copy of the requested filefrom the first storage node so that subsequent requests for the datafile by the user will be handled by the third storage node.
 31. Thecomputer readable medium of claim 22 further comprising: a fifth codemodule for determining if any of the storage nodes containing therequested data file have sent a message within a predetermined period oftime indicating that it is too busy, wherein if no message has been sentwithin the predetermined time period, the request is directed to thefirst storage node and the step of determining whether the first storagenode is too busy is not performed.